Advertisement
Advertisement
China technology
Get more with myNEWS
A personalised news feed of stories that matter to you
Learn more
Local governments in China’s underdeveloped provinces aim to cultivate the growth of data labelling enterprises, which would help foster stable employment and ease poverty. Photo: Shutterstock

Data labelling jobs are coming to China’s underdeveloped regions, but are they sustainable?

  • Guizhou, one of the poorest provinces in China, is morphing into a big data hub for major hi-tech companies

Law school graduate John Li knows the latest developments in China’s autonomous driving industry – sort of.

Li, a 24-year-old Hui Muslim from southwest China’s Guizhou province – a mountainous inland region that is being developed into the country’s next big data hub – has been working on data labelling tasks for autonomous vehicles for nearly two years. Li said he has witnessed the sensor technology for these driverless cars evolve from webcams and then laser radars to using a combination of both at present.

While he has not yet seen a driverless car, Li said he takes great pride in his work and the team of about 100 “taggers” he manages. This team’s job mainly involves drawing boxes on pictures and video frames captured by cameras installed on driverless cars across the country, and then adding annotations to the cars, bikes, pedestrians and various traffic signs on those still photos and videos.

The processed data goes into different data sets that are used to train artificial intelligence (AI) algorithms developed by China’s technology giants, including Alibaba Group Holding and Baidu. E-commerce giant Alibaba is the parent company of the South China Morning Post.

“My work is to prepare clean data for autonomous driving, and I am making a small contribution to the technology’s advancement,” said Li. This is what he would often explain to friends and family who are intrigued by data labelling, an enterprise that found a home in Guizhou just a few years ago.

Other than make annotations, Li and his colleagues sometimes handle other tasks, including transcribing speeches in Chinese dialects (“I have heard dialects from every province and region”), annotating human and animal faces, and making TikTok videos of themselves dancing to music (the dance routines are choreographed ahead of time by unknown parties, so there are no freestyle surprises).

Li works in a big data company tucked away in a small town surrounded by hills and mountains in Guizhou, which is thousands of kilometres away from the country’s artificial intelligence (AI) research and development hubs in Beijing, Shanghai, Hangzhou and Shenzhen.

The data labelling work in Guizhou borrows from the Qiandian Houchang model – literally, “front shop, back factory” – a template adopted by the manufacturing sector. In the late 1970s, companies in Hong Kong began to migrate their basic production and processing en masse to neighbouring Guangdong province. That move helped what was at the time a poor, backward agricultural and fishing region to be transformed into the world’s factory years later. In 1978, Guangdong’s gross domestic product was less than US$13.5 billion. Last year, that number reached US$1.47 trillion. By comparison, Australia’s GDP in 2018 was US$1.32 trillion.

Guangdong’s success has appealed to many inland provinces with an abundant labour force and limited industrialisation. They see the potential of data labelling – which is clean, labour-intensive and supports hi-tech industries – in generating tax revenue, providing stable employment and enabling poverty alleviation.

Guizhou, one of the poorest provinces in China, is now hoping to cash in on the rapid development of the country’s AI industry, as companies look for relatively cheap and stable labour supply. The provincial government has set up a dedicated department to oversee the development of its big data industry. It has made the promotion of a number of services, including data labelling, collection and processing, as one of the priority tasks this year.

The local government of the Qiannan Buyi and Miao Autonomous Prefecture, where Li’s company is based, established a 30 million yuan (US$4.2 million) fund last year that is focused on the promotion of big data enterprises. Registered companies are entitled to various government subsidies for rent, broadband connection and electricity for three years, as well as generous tax breaks for five years.

Xia Bingqing, a scholar on China’s digital economy at Shanghai’s East China Normal University, said Guizhou has provided plenty of support for data labelling as part of its policy to alleviate poverty in the province.

Another landlocked province, Shaanxi, has followed Guizhou’s lead to promote data labelling enterprises. Shanxi, known for its coal, natural gas and crude oil resources, wants to bring in more than 100 data labelling companies and train more than 10,000 workers by 2022. The province aims to be a leader in China’s big data market and develop an industry worth 5 billion yuan by 2025.

As those two provinces offer lower production costs, generous incentives and cheap labour, the AI industry has responded by helping local governments provide stable employment and improved standard of living.

The Alipay Foundation and Alibaba’s AI Labs recently launched the “A-Idol Initiative”, which provides free training courses on data labelling and curating data for women in the country’s vast underdeveloped areas. The initiative, which began in Guizhou’s Tongren city, where more than 580,000 people live below the poverty line, will cover 10 poverty-stricken counties across China. Alibaba AI Labs also committed at least 10 million yuan in annual data processing orders to ensure the sustainability of the initiative.

To be sure, data labelling initiatives backed by major hi-tech players and guaranteed orders remain few in China, according to industry insiders. Many data labelling enterprises in developed areas are already facing sustainability issues because of the advances being made in machine learning, which means fewer contracts for labour-intensive service providers.

“For our industry, expertise is very important,” said Du Lin, chief executive of Beijing-based data processing start-up Basic Finder. “Those so-called data villages don’t have the expertise. They think they can handle data as long as they can use computers, but they will find their [data labelling] tasks getting more and more difficult.”

Although data labelling, like other outsourced business processes, is labour-intensive, the current push for lower labour costs is taking the industry in the wrong direction, Du said.

Unlike other hi-tech companies, Basic Finder has moved to locate its own data centres in and around Beijing rather than built them in rural areas where labour costs are cheaper. “Though that would mean higher costs, we think in the long run we can train our staff and have an edge in our core competitiveness,” Du said.

Basic Finder has turned down invitations from local governments to establish operations in rural areas. “If we were to set up cooperation with the local governments, we must make sure there are enough tasks for those local industries,” Du said. “But at the moment, we find that most of our demands could be solved by crowdsourcing because it’s more efficient. There’s no need for a dedicated workforce [in the rural areas].”

A data industry insider known by the pseudonym “Seven”, who once managed a nearly 8,000-member freelance data labelling online collaboration team, agreed with Du’s reluctance to shift data labelling jobs to less developed regions. “Unless big companies support them and keep feeding them orders, or state-owned enterprises take over and look after the staff, they will find themselves unable to support their operation because of the declining orders,” the insider said.

Changes in the data labelling market are happening so fast that it has gone through several stages of the industry life cycle – introduction and growth – which took decades for other enterprises. At the peak of demand for speech data labelling in 2017 (partly because of the proliferation of smart speakers), sometimes three to four thousand people from insider Seven’s team would work on the same task together. But since the second half of last year, those tasks have been disappearing from major crowdsourcing platforms. “Once a mature product is out, there won’t be much need for speech data labelling tasks,” the insider said. “It’s like digging our own graves.”

Seven has disbanded his group of freelancers, as did many other smaller data labelling teams scattered in China’s low-tier cities and rural areas. Li, the law school graduate who manages a data labelling team, said he plans to pursue other endeavours in the next few years.

Basic Finder’s Du, however, sees a silver lining in the industry shake-up, as it weeds out the weak players and enable those who remain to become stronger. “It’s unavoidable, and it’s a good trend,” he said. “This forces the industry to become more professional.”

For more insights into China tech, sign up for our tech newsletters, subscribe to our Inside China Tech podcast, and download the comprehensive 2019 China Internet Report. Also roam China Tech City, an award-winning interactive digital map at our sister site Abacus.

Post