Data protection on the web is an illusion

Two years ago, the Belgian computer science lecturer Yves-Alexandre de Montjoye moved from Boston to London, was looking for a new doctor and was handed a form in the practice. He should agree that his health data may be passed on to research institutions and companies; anonymized of course. De Montjoye still remembers one sentence today: Some believed, it was written there, that individual patients could be identified with this data.

What his doctor said in the subjunctive, de Montjoye himself has now put in the indicative. Together with two Belgian colleagues, he investigated whether allegedly anonymous data sets actually allow conclusions to be drawn. "Our results suggest that even heavily anonymized samples do not meet the standards of data protection laws," the researchers write in the journal Nature.

Their study shows that eight out of ten cases, gender, date of birth and zip code are enough to unequivocally identify people; you already know that it is Barbara Miller or John Smith from So-and-so. And with just 15 pieces of information, the scientists can determine the identity of an American with a probability of 99.98 percent.

Companies sometimes market data sets that each contain hundreds of characteristics of millions of people. Data trading is mostly legal because a large part of the data is not considered to be personal. Regulations such as the EU General Data Protection Regulation therefore do not apply.

Billions of people's data are floating around on the Internet

As early as the mid-1990s, the then governor of Massachusetts, William Weld, involuntarily showed that anonymous data is often an illusion. His state released a database that contained patient records from civil servants. Weld assured that no one should be afraid. All personal characteristics such as name, address and social security number have been removed.

Shortly afterwards he found his own patient file in the mailbox: Latanya Sweeney, then a computer science student, now a Harvard professor, was able to identify Weld in the data set and show that he had promised too much. And in 2006 AOL published searches from 650,000 users; In 2007, Netflix released video recommendations from 500,000 users; In 2016, the Australian government published health data on 2.9 million people. Allegedly the data was completely anonymous. In all cases, researchers were able to link the information to specific people.

In addition, unsuspecting users of apps and browser extensions can be monitored without even realizing it. Criminals steal huge data sets. Billions of users' data is traded on the Internet, legally and illegally. Individuals can hardly prevent this, as stricter laws and better anonymization procedures could guarantee that. But perhaps researchers like de Montjoye can prevent people from voluntarily expanding the treasure trove of data. It can be enough not to tick the box or not to sign a form.