What is data de-identification?

One way of keeping data confidential is to remove personal information. Learn more about this process, known as de-identification.

How is data de-identified?

The information that has been collected in our health records, when combined with data from many other people, provides researchers with hugely important insights into the health of the entire country.

Scotland’s Trusted Research Environments (TREs) – also known as Data Safe Havens or known as Secure Data Environments in England and Wales – provide researchers with a secure computing environment to examine large volumes of health data from thousands of different people, while keeping the personal health data of people in Scotland safe.

It is important that data about us remains confidential and secure.

One way to do this is to remove all ‘identifiers’ from the data. This will include names, addresses and the unique personal identifier used by the NHS – the Community Health Index (CHI), for example. The record can also be de-identified even further, by giving it a completely new ‘pseudonym’ – a random code that is unique for each research project, which makes sure all the information in the record is kept together whilst ensuring that an individual cannot be identified.

An extra step is to make sure that nobody handling the data – for instance preparing it for a researcher to look at – has been involved in the process of de-identifying it.

According to the UK Information Commissioner’s Office (ICO), an independent body set up to uphold information rights, this process of de-identification – known as ‘pseudonymisation’ – is designed to reduce the risk of data being identified. However, it cannot eliminate that risk completely, which is why this is only one of a number of technical and organisational measures taken to protect people’s privacy.

Learn more about the other ways data is kept secure in this explainer: What are Trusted Research Environments?

Definitions

The ICO offer these definitions to explain the difference between anonymisation and pseudonymisation:

Anonymisation means that individuals are not identifiable and cannot be re-identified by any means reasonably likely to be used (i.e., the risk of re-identification is sufficiently remote). Anonymous information is not personal data and data protection law does not apply.

Pseudonymisation means that individuals are not identifiable from the dataset itself but can be identified by referring to other information held separately and not made available to researchers. Pseudonymous data is therefore still personal data and data protection law applies.

Discover more about data

Illustration of a desktop computer with two keys on the screen.

Intro to public sector data

Learn more about what public sector data is and how it's used in research.

Learn more

What is the Five Safes framework?

Discover what the Five Safes framework is and how it's used to keep data secure.

Learn more

Illustration of a line graph made to look like mountains.

What are Trusted Research Environments?

Learn about Trusted Research Environments and how they help researchers access data.

Learn more

Subscribe to our updates

To stay updated with Research Data Scotland, subscribe to our mailing list and follow us on X (Twitter) and LinkedIn.

Sign up here

Illustration of an envelope with a letter sticking out and a mobile phone with a person