Skip to content

Synthetic data

Read about Research Data Scotland (RDS)'s work on synthetic data, led by Data Curation Manager Dr. Lynne Adair.

Introduction 

Research Data Scotland (RDS) is working to improve the economic, social and environmental wellbeing in Scotland by enabling access to and linkage of data about people, places and businesses for research in the public good. Within this remit, RDS, together with external partners and other data organisations, aims to develop a coordinated strategy for the production and use of synthetic data in Scotland. RDS will provide system leadership, facilitation, information governance support, resource funding and other support as required. 

Currently, our work on synthetic data is led by Dr Lynne Adair, Data Curation Manager. Lynne has set up a working group, run a user workshop, and is leading on the RDS synthetic data fund. She is also working with other organisations across the UK synthetic data landscape to help co-ordinate work around public engagement.  

What is synthetic data? 

Synthetic data is ‘a new copy of a data set that is generated at random but made to follow the structure and some of the patterns of the original data set. Each piece of information in the data set is meant to be plausible... but it is chosen randomly from the range of possible values, not by pointing to any original individual in the data set’ (see note 1). 

Synthetic datasets allow researchers to test approaches and write code that analyses data while awaiting the necessary permissions to use actual data. Ideally, it would be possible to produce synthetic datasets as standard and make these freely available to researchers to enable data discovery, code development, training for using linked administrative data, and AI/Model training. RDS will investigate how to do this and make it part of business as usual. 

Synthetic data fund: Autumn 2023

In Autumn 2023, we launched a new fund for non-commercial organisations in Scotland to explore the use of synthetic data.

Approximately £100,000 was awarded to non-commercial organisations to help support work that fits within the remit of the RDS synthetic data strategy and particularly within the following areas:

  • Disclosure risk and information governance (IG)
  • Synthesis of data
  • Access, promotion and engagement

Find out more about the fund recipients

Our work to date  

Conversations with all four regional safe havens (RSHs), Public Health Scotland (PHS), the Office for National Statistics (ONS), Health Data Research UK (HDRUK), NHS National Services Scotland (NSS) and other partners were used to create the RDS synthetic data strategy. This strategy sets out the work that RDS will lead and fund.

As part of our strategy, RDS will survey researchers/users on their synthetic data requirements; consult with data controllers and the public to gauge their understanding and concerns around synthetic data; explore options with data controllers for synthetic data generation projects; bring together IG expertise in different organisations; and map existing synthetic datasets to investigate whether we can make synthetic datasets that are already developed more widely available. To oversee and operationalise plans for synthetic data, RDS has established a Synthetic Data Working Group. The group will identify similarities and differences in synthetic data needs, governance, and access for different organisations.

RDS has also funded three projects relating to synthetic data. Professor Gillian Raab at the University of Edinburgh will lead a review on how to measure the disclosure risks from synthetic data. The West of Scotland Safe Haven and DataLoch will look at synthetic data governance by providing best practice guidelines on the risks posed by synthetic data. The Grampian Data Safe Haven (DaSH) will look at creating synthetic health datasets based on popular unconsented Scottish patient datasets.  

Future work 

In future, RDS hopes to work with data controllers to produce synthetic datasets for training, data discovery and code development on an ongoing basis. 

 

Note 1: Accelerating public policy research with synthetic data: a report from the Behavioural Insights Team: Dr. Paul Calcraft, Dr. Iorwerth Thomas, Martina Maglicic, Dr. Alex Sutherland: Accelerating public policy research with synthetic data – ADR UK   

Related content

Nine packed library bookshelves.

Current projects

Learn more about the data and access projects we're working on.

View projects
Close up on a person

Data explainers

We've created some data explainers to help everyone understand common terminology, frameworks and principles.

View resources
A large and very busy zebra crossing with people crossing left and right

Public engagement

Find out how we're raising the profile of data for research through our work with the public.

Find out more

Was this information helpful?