Risks, challenges and limitations
Protected characteristics data can improve lives, but only if collected and used accurately and responsibly. The participants indicated the range of technical and ethical challenges, risks and limitations they face in the collection, sharing and use of protected characteristics data.
Issues with data collection: Technical and resource constraints can limit the collection of this type of data, as can a lack of understanding in those providing or collecting the data of the importance of accurate recording. For example, data collectors may input incorrect data if they make assumptions about an individual’s identity, rather than allowing them to self-report.
Missing data: These issues with data collection result in missing data on protected characteristics, a frequent concern among respondents. Participants found that data on some protected characteristics was particularly lacking, such as data on ethnicity, gender identity and disability. Limited representation risks the needs of some groups becoming invisible, often intersecting with already marginalised populations.
Poor quality data: Data on marginalised groups is not only more likely to be missing, but also of poorer quality. This risks the misrepresentation of reality for groups about whom the data is collected. Issues with data quality that participants cited include protected characteristics data being inaccurate, inconsistency between data sources, and biased data.
Issues with data sharing: Balancing data utility and privacy was a major challenge, often slowing research, reducing granularity, or preventing data release.
Issues with analysis: The combining of groups with small counts in research risks these groups being underrepresented or invisible in research. For example, grouping ethnic categories to reduce the risk of re-identification of individuals can obscure important differences between groups.
Risks of using and interpreting protected characteristics data: Respondents expressed concerns that equalities data could be misused or misrepresented in ways that reinforce stigma, misrepresent certain groups, or lead to discriminatory outcomes. If these harms occur, the trust of groups or individuals in the organisations that collect and store their data can be lost, impacting their willingness to contribute data in future and further limiting representation within the data. Risks are heightened for small or marginalised groups, where poor interpretation can further exclude them.
Discussions in the focus group demonstrated that there is no single, clear-cut solution to these difficult trade-offs. Consideration must be given to the risk of doing the research and potential negative impacts on communities, against the risk of not doing the research and losing the opportunity of revealing and addressing societal inequalities.