Skip to content

Big data is interesting but small data has real impact

Three children hold hands standing in grass, with their backs towards the camera
Blog posts

Katherine K. O’Sullivan | Average reading time 6 minutes

13 May 2024

This guest blog authored by Katherine O'Sullivan of the Grampian Data Safe Haven is part of a series spotlighting the work of Scotland’s Safe Havens, also known as Trusted Research Environments.

Linking data from different public sector areas is a vital step towards improving public services and the lives of people living in Scotland. For example, linking multi-agency population data with health data can help us to understand determinants of health and other outcomes, giving critical insights to health and social care organisations looking after regional populations, and in particular vulnerable populations within those regions. More importantly, research findings have the potential to improve health and social care service policies and shape how those services are delivered, ultimately improving people's lives now, and in the future. 

In recent research, University of Aberdeen researchers and NHS Grampian Health Board clinicians analysed regional trends in mental health services and prescribing for children and young people, with a focus on the outcomes of children deemed ‘at risk’ by Aberdeen City Council, as part of the Health Foundation’s Networked Data Lab. 

The processing and linkage of health and local authority data was performed by the Grampian Data Safe Haven (DaSH), a Trusted Research Environment and Regional Safe Haven for the north of Scotland. 

Processing the data for this project was a significant undertaking, and not without its challenges, but the research enabled by DaSH identified important trends in specialist mental health services for children and young people, and for vulnerable children in particular, and is a great example of the vital role that the Regional Safe Havens play in Scotland's research landscape.

Challenge one: cross-agency data sharing 

First, in order to be able to link data between agencies, there needs to be data sharing agreements in place. What data could be shared between agencies to support this research is very different from what data should be shared. 

The Caldicott Guardian Council sets out eight principles that research and organisations facilitating data for research access must follow, with one being the requirement to request the least amount of confidential data possible. In this case, it took Information Governance teams across Aberdeen City Council, NHS Grampian, University of Aberdeen and DaSH almost 18 months to agree to the specific (and least amount of) confidential data to be shared between organisations that would answer the research question, and to receive ethical permissions to undertake the research. 

The amount of time taken is certainly not conducive to rapid research (such as what was required during the COVID-19 pandemic). But since it was the first of its kind research in north east Scotland, the teams wanted to get data sharing agreements right. An important outcome, though, is that there is now a process in place for future cross-agency data sharing research projects, and timescales for ethical permissions should decrease as more researchers undertake research using multi-agency data linkage.

Challenge two: data processing and data linkage 

Once data sharing agreements are in place and researchers have obtained ethical permissions to undertake sensitive data research, specialist staff at DaSH begin the process of processing the data ready to be shared with the researchers. 

This involves: 

  • receiving data from the different organisations, 
  • checking that the data received is the data that should be provided, 
  • cleaning and preparing the data, 
  • de-identifying the data to ensure the researchers don't receive any identifiable information about data subjects, in line with rigorous privacy and confidentiality legislation, 
  • and finally linking the data together, enabling researchers to undertake their analysis. 

The biggest challenge we encountered in this project was understanding the data requirements across different organisations. DaSH work very closely with researchers to understand project requirements, research protocols and building cohorts, but other organisations are often not as familiar with interpreting sometimes highly technical documents. 

At first, the original data request was misinterpreted, meaning the data we received was incomplete and missing a number of records. This required a re-extraction of the data and additional data processing, slowing down researcher analysis. 

However, this identified an important learning for DaSH and researchers: when working with external data providers, it’s important to form a collaborative working group to define, translate and agree requirements. This will embed data science best practices across multi-agency data collaborations. 

“When working with external data providers, it’s important to form a collaborative working group to define, translate and agree requirements. ”

Katherine K. O’Sullivan, Operational Lead, Grampian Data Safe Haven (DaSH)

Challenge three: vulnerable populations and data linkage 

In Scotland, it’s generally possible to link a person’s full health records via their Community Health Index (CHI) number. However, whilst adults in local authority data have their CHI number associated with their records, children and young people do not. As this project focused on children and young people’s data, DaSH had to undertake what is called CHI seeding, where we triangulate information about an individual (forename, surname, date of birth, postcode history, address, etc.) and create a calculation to determine whether the information in the health data matches their information provided by the local authority. 

This is a time-consuming process and requires lots of manual checks (and double checks!) against records. We were able to match 69% of children and young people deemed ‘at risk’ to a strong degree of certainty, but we were not able to match 31% of ‘at risk’ children. 

The matching was difficult due to two main factors. Firstly, the ‘at risk’ group have data manually captured by social workers, whose role (rightly) is to safeguard children rather than being data entry specialists, and data provided by carers may have been provided incorrectly, particularly where data provided to the local authority was not by primary carers. Secondly, there is often a lot of transience of these children and young people between different carers and homes, and the resulting fluctuations of addresses and postcodes meant that two key pieces of matching criteria were unreliable. 

However, although our matching rate was not higher, our work demonstrated that CHI matching children and young people is critical to understand the health needs of vulnerable children. As a result, there has been a Scotland-wide policy change to work towards CHI-seeding all local authority records, including children and young people, to facilitate faster and easier linkage to health data to better support all vulnerable populations.  

“The data linkage performed by DaSH for this project enabled national policy change that will make local authority and health data linkage for children and young people better for the whole of Scotland. ”

Katherine K. O'Sullivan, Operational Lead, Grampian Data Safe Haven (DaSH)

The value of Regional Safe Havens 

Using regional data can be an incredibly accurate mechanism for implementing meaningful improvements to public sector services and outcomes. Moreover, the data linkage performed by DaSH for this project enabled national policy change that will make local authority and health data linkage for children and young people better for the whole of Scotland. The role of the Regional Safe Havens across Scotland in providing sensitive data for research ensures that regional population analysis is just as impactful as bigger population-wide studies and offer real, implementable solutions that will improve health and care outcomes for our families, friends and neighbours.

To learn more about the studies and their findings, please click below: 

Inequalities in children’s mental health care: analysis of routinely collected data on prescribing and referrals to secondary care

Serious mental health diagnoses in children on the Child Protection Register: a record linkage study

Related content

Tay Bridge at sunset

Data Delight: a double celebration for HIC

This guest blog authored by Dr Laura Ward of the Health Informatics Centre (HIC) is part of a series spotlighting the work of Scotland’s Safe Havens, also known as Trusted Research Environments.

Dr Laura Ward

26 Mar 2024

Coloured boardgame pieces

PETs in the NHS

This guest blog authored by Dr Charlie Mayor of West of Scotland Safe Haven is part of a series spotlighting the work of Scotland’s Safe Havens, also known as Trusted Research Environments.

Dr Charlie Mayor

27 Mar 2024

Subscribe to our updates 

To stay updated with Research Data Scotland, subscribe to our monthly newsletter and follow us on X (Twitter) and LinkedIn

Subscribe to our newsletter Subscribe to our newsletter
Illustration of an envelope with a letter sticking out and a mobile phone with a person