A major international media investigation has revealed that a technology firm linked to the Chinese Communist Party has created and mined a global database of 2.4 million individuals – many of them political leaders, scientists, journalists and others in positions of influence – based on their online presence in order to monitor them and their networks. The Indian Express, one of the media outlets that investigated the story, reported earlier today that the firm, Zhenhua Data Information Technology Co. Ltd., describes itself as a pioneer in using big data tools for “hybrid warfare.” Hybrid warfare is a set of techniques through which a country attempts to shape the information environment in a target state by systematic influence and interference operations, alongside the use of other coercive tools, in order to achieve strategic objectives without open warfare.
The Express, along with the Australian Financial Review (AFR), Italy’s Il Foglio and The Daily Telegraph, London, was given access to a massive trove of data that Zhenhua had used to create its database by an unnamed source close to the company with help from an American academic, Christopher Balding, who taught at the elite Peking University until 2018 and had since been based in Vietnam. The database contains the names of around 10,000 Indians – including a who’s who of the country’s political and military establishment – as well as more than 35,000 Australians, including Prime Minister Scott Morrison.
AFR notes that out of the 35,000 Australians, the database refers to 656 as “special interest” or “politically exposed” – terms whose exact meaning remains unknown. The special tags suggest that the database was mostly likely designed to be used by the Chinese government for clandestine targeting of individuals for various intelligence operations. The database also includes names of 52,000 Americans, along with nationals from the U.K., Canada, Indonesia, Malaysia and even Papua New Guinea, AFR reports. It also notes that in one instance the database was used to monitor “the career progression of a U.S. naval officer” who was “flagged as a future commander of a nuclear aircraft carrier.” This suggests that the database was created for predictive analytics of the kind used by social media giants.
The Express notes that Zhenhua was also interested in hundreds of Indian individuals who have been accused of “financial crime, corruption, terrorism, and smuggling of narcotics, gold, arms or wildlife.” This belies an interest of Chinese intelligence in individuals who could be potentially leveraged or otherwise exploited in specific operations, consistent with similar practices of other intelligence services globally.
While the Zhenhua database will be certainly be further analyzed by the media outlets that possess it, three things stand out about the revelations so far.
First, the timing of the data dump. P. Vaidyanathan Iyer, one of the Indian Express’ investigators of the story, tweeted earlier today that he was alerted to the database by an “academician” on May 21. It is likely that all other newspapers who now possess the database would also have been approached near simultaneously. If this is the case, then the timing of the data dump – in the third week of May – must be explained. Note that right around that time, just days before, Australia’s push for a coronavirus inquiry had attracted Beijing’s wrath. It was also weeks after the first clashes between India and China, in Eastern Ladakh and Northern Sikkim, precursors to the tense ongoing military standoff between the two countries. So, at this point what remains to be understood is the extent to which the source’s decision to dump data may or may not have been linked with these developments.
Second, it is not known whether the source (through any intermediary) had approached an American media outlet. In many ways, the decision to do so would have been natural given that more than a fifth of the database consists of U.S. citizens and the source had worked with Balding, an American who has since returned to the U.S. out of safety concerns in the run-up to today’s expose.
Third, that China maintains a database of this sort isn’t particularly surprising. Machine learning from large publicly sourced databases has emerged as a key enabler for intelligence agencies. Technology firms in the national security space maintain a keen interest in the intersection of open-source intelligence and artificial intelligence. For example, the U.S.-based Palantir Technologies, cofounded with start-up capital from the nation’s intelligence community, as well as other American firms such as Recorded Future, seek to utilize big data for intelligence solutions.
Furthermore, Chinese intelligence has a long tradition of following a “Thousand Grains of Sand” strategy by which it has utilized large number of ordinary Chinese citizens, as well as locals, abroad to source discrete pieces of information which can then be put together to form a larger picture. From Beijing’s perspective it is therefore perfectly natural that the government would maintain large databases of foreign nationals and dynamically track interrelationships even when specific entries – such as family members of targets – may not be individually valuable, either as potential sources or as targets for influence operations. This is more so the case given China’s growing investments in other national security projects that utilize artificial intelligence, such as facial recognition. The question now remains the extent to which the Chinese Ministry of State Security or other intelligence agencies may have already acted based on the Zhenhua database.