What is data de-identification?

One way of keeping data confidential is to remove personal information. Learn more about this process, known as de-identification.

How is data de-identified?

The information that has been collected in our health records, when combined with data from many other people, provides researchers with hugely important insights into the health of the entire country.

Scotland’s Trusted Research Environments (TREs) – also known as Data Safe Havens or known as Secure Data Environments in England and Wales – provide researchers with a secure computing environment to examine large volumes of health data from thousands of different people, while keeping the personal health data of people in Scotland safe.

It is important that data about us remains confidential and secure.

One way to do this is to remove ‘identifiers’ from the data. This can include names, addresses and the unique personal identifier used by the NHS – the Community Health Index (CHI), for example. The record can also be de-identified even further, by giving it a completely new ‘pseudonym’ – a random code that is unique for each research project, which makes sure all the information in the record is kept together whilst ensuring that an individual cannot be identified.

An extra step is to make sure that nobody handling the data – for instance preparing it for a researcher to look at – has been involved in the process of de-identifying it, and that any ‘key’ which could be used to re-identify individuals is stored completely separately and not accessible to unauthorised persons.

GDPR and ICO considerations for pseudonymisation

The UK GDPR defines pseudonymisation as: ‘processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organisational measures to ensure that the personal data are not attributed to an identified or identifiable natural person.'

According to the UK Information Commissioner’s Office (ICO), an independent body set up to uphold information rights, this process of de-identification – known as ‘pseudonymisation’ – is designed to reduce the risk of data being identified. However, it cannot eliminate that risk completely, which is why this is only one of a number of technical and organisational measures taken to protect people’s privacy.

What is anonymisation?

Another way to protect personal data is to anonymise it. Data protection legislation does not explicitly define ‘anonymous information’ but it should be understood as the end result of a process that converts personal data into information to which data protection law no longer applies. In other words, it is information that is out of scope of the UK GDPR and DPA 2018.

Recital 26 of the UK GDPR explains that anonymisation is the way in which you transform personal data into anonymous information, so that individuals cannot be identified and it falls outside the scope of data protection legislation. The term ‘anonymisation’ is used to cover the techniques and approaches outlined below, which prevent the reidentification the individuals. Data can be considered effectively anonymised if it:

Does not relate to an identified or identifable person, or
It is rendered anonymous in such a way that individuals are not identifiable through appropriate minimisation of the data and controls around who and how the data are accessed.

The ICO provides detailed further guidance and information on anonymising personal data in its publication ‘Anonymisation: Managing data protection risk code of practice.'

Learn more about the other ways data is kept secure in this explainer: What are Trusted Research Environments?

Definitions

The ICO offer these definitions to explain the difference between anonymisation and pseudonymisation:

Anonymisation means that individuals are not identifiable and cannot be re-identified by any means reasonably likely to be used (i.e., the risk of re-identification is sufficiently remote). Anonymous information is not personal data and data protection law does not apply.

Pseudonymisation means that individuals are not identifiable from the dataset itself but can be identified by referring to other information held separately and not made available to researchers. Pseudonymous data is therefore still personal data and data protection law applies.