Read about Research Data Scotland (RDS)'s work on synthetic data, led by Data Curation Manager Dr. Lynne Adair.
Research Data Scotland (RDS) is working to improve the economic, social and environmental wellbeing in Scotland by enabling access to and linkage of data about people, places and businesses for research in the public good. Within this remit, RDS, together with external partners and other data organisations, aims to develop a coordinated strategy for the production and use of synthetic data in Scotland. RDS will provide system leadership, facilitation, information governance support, resource funding and other support as required.
Currently, our work on synthetic data is led by Dr Lynne Adair, Data Curation Manager. Lynne has set up a working group, run a user workshop, and is planning a synthetic data fund. She is also working with other organisations across the UK synthetic data landscape to help co-ordinate work around public engagement.
What is synthetic data?
Synthetic data is ‘a new copy of a data set that is generated at random but made to follow the structure and some of the patterns of the original data set. Each piece of information in the data set is meant to be plausible... but it is chosen randomly from the range of possible values, not by pointing to any original individual in the data set’ (see note 1).
Synthetic datasets allow researchers to test approaches and write code that analyses data while awaiting the necessary permissions to use actual data. Ideally, it would be possible to produce synthetic datasets as standard and make these freely available to researchers to enable data discovery, code development, training for using linked administrative data, and AI/Model training. RDS will investigate how to do this and make it part of business as usual.
Our work to date
Conversations with all four regional safe havens (RSHs), Public Health Scotland (PHS), the Office for National Statistics (ONS), Health Data Research UK (HDRUK), NHS National Services Scotland (NSS) and other partners were used to create the RDS synthetic data strategy. This strategy sets out the work that RDS will lead and fund.
As part of our strategy, RDS will survey researchers/users on their synthetic data requirements; consult with data controllers and the public to gauge their understanding and concerns around synthetic data; explore options with data controllers for synthetic data generation projects; bring together IG expertise in different organisations; and map existing synthetic datasets to investigate whether we can make synthetic datasets that are already developed more widely available. To oversee and operationalise plans for synthetic data, RDS has established a Synthetic Data Working Group. The group will identify similarities and differences in synthetic data needs, governance, and access for different organisations.
RDS has also funded three projects relating to synthetic data. Professor Gillian Raab at the University of Edinburgh will lead a review on how to measure the disclosure risks from synthetic data. The West of Scotland Safe Haven and DataLoch will look at synthetic data governance by providing best practice guidelines on the risks posed by synthetic data. The Grampian Data Safe Haven (DaSH) will look at creating synthetic health datasets based on popular unconsented Scottish patient datasets.
In future, RDS hopes to work with data controllers to produce synthetic datasets for training, data discovery and code development on an ongoing basis.
Note 1: Accelerating public policy research with synthetic data: a report from the Behavioural Insights Team: Dr. Paul Calcraft, Dr. Iorwerth Thomas, Martina Maglicic, Dr. Alex Sutherland: Accelerating public policy research with synthetic data – ADR UK
Five minute profile: interview with Dr Lynne Adair
Our interview series shines a light on what it’s like to work at RDS. Meet Dr Lynne Adair, Research Data Scotland Data Curation Manager.
Research Data Scotland
05 May 2023
Learn more about the data and access projects we're working on.
We've created some data explainers to help everyone understand common terminology, frameworks and principles.
Find out how we're raising the profile of data for research through our work with the public.