Skip to content

Synthetic data

Read about Research Data Scotland (RDS)'s work on synthetic data.

Introduction 

Research Data Scotland (RDS) is working to improve the economic, social and environmental wellbeing in Scotland by enabling access to and linkage of data about people, places and businesses for research in the public good. Within this remit, RDS, together with external partners and other data organisations, aims to develop a coordinated strategy for the production and use of synthetic data in Scotland. RDS will provide system leadership, facilitation, information governance support, resource funding and other support as required. 

Our work to date includes setting up a working group, running user workshops, and establishing the RDS synthetic data fund. We are also working with other organisations across the UK synthetic data landscape to help co-ordinate work around public engagement.

What is synthetic data? 

Synthetic data is artificial data that contains no information about real people, but follows some of the same patterns as real-world data. Each piece of information in the synthetic dataset is usually designed to be plausible, but is created at random based on the structure of original, real data.

Synthetic datasets allow researchers to test approaches and write code that analyses data while awaiting the necessary permissions to use actual data. Ideally, it would be possible to produce synthetic datasets as standard and make these freely available to researchers to enable data discovery, code development, training for using linked administrative data, and AI/Model training. RDS will investigate how to do this and make it part of business as usual. 

Find out more about synthetic data in this short video, supported by funding from Health Data Research UK (HDR UK) and the Medical Research Council (MRC):

 

Watch this video with British Sign Language (BSL) interpretation

Synthetic data workshop: June 2024

Want to know more about working with synthetic data? Join our free workshop!

On Thursday 20 June, join us for a synthetic data workshop to learn more about using the synthpop package for R to create synthetic versions of confidential individual-level data. At the workshop, Professor Gillian Raab will give an introduction to synthpop and help users understand and work through issues. Participants must have a working knowledge of R and will need to bring their own laptop with synthpop installed.

The workshop will take place at the Bayes Centre, Edinburgh on Thursday 20 June (10:00 – 12:00). The event is free to attend.
 
To register or for more info, please contact: Sophie.McCall@researchdata.scot

Synthetic data fund: Autumn 2023

In Autumn 2023, we launched a new fund for non-commercial organisations in Scotland to explore the use of synthetic data.

Approximately £100,000 was awarded to non-commercial organisations to help support work that fits within the remit of the RDS synthetic data strategy and particularly within the following areas:

  • Disclosure risk and information governance (IG)
  • Synthesis of data
  • Access, promotion and engagement

Find out more about the fund recipients

Our work to date  

Conversations with all four regional safe havens (RSHs), Public Health Scotland (PHS), the Office for National Statistics (ONS), Health Data Research UK (HDRUK), NHS National Services Scotland (NSS) and other partners were used to create the RDS synthetic data strategy. This strategy sets out the work that RDS will lead and fund.

As part of our strategy, RDS will survey researchers/users on their synthetic data requirements; consult with data controllers and the public to gauge their understanding and concerns around synthetic data; explore options with data controllers for synthetic data generation projects; bring together IG expertise in different organisations; and map existing synthetic datasets to investigate whether we can make synthetic datasets that are already developed more widely available. To oversee and operationalise plans for synthetic data, RDS has established a Synthetic Data Working Group. The group will identify similarities and differences in synthetic data needs, governance, and access for different organisations.

RDS has also funded three projects relating to synthetic data. Professor Gillian Raab at the University of Edinburgh will lead a review on how to measure the disclosure risks from synthetic data. The West of Scotland Safe Haven and DataLoch will look at synthetic data governance by providing best practice guidelines on the risks posed by synthetic data. The Grampian Data Safe Haven (DaSH) will look at creating synthetic health datasets based on popular unconsented Scottish patient datasets.  

Future work 

In future, RDS hopes to work with data controllers to produce synthetic datasets for training, data discovery and code development on an ongoing basis.

Related content

Nine packed library bookshelves.

Current projects

Learn more about the data and access projects we're working on.

View projects
Close up on a person

Data explainers

We've created some data explainers to help everyone understand common terminology, frameworks and principles.

View resources
A large and very busy zebra crossing with people crossing left and right

Public engagement

Find out how we're raising the profile of data for research through our work with the public.

Find out more

Was this information helpful?