Indexing & linkage


Indexing is the process of replacing a unique identifier such as the CHI number, or a combination of identifiers such as name, date of birth and post code with a randomly assigned study ID so that records can still be linked but no explicit “real world” identifiable information about an individual is made public. The process uses probability matching of available personal identifiers to a population spine and returning an anonymised unique person identifier specific to the project. For health data where PHS is the data controller eDRIS can perform the indexing, otherwise indexing services are provided by trusted third party indexing services from National Records Scotland (NRS) Indexing Team or PHS CHI Linkage (CHILi) Team as required.

A useful image illustrating the indexing process can be found at the NRS website here.


Data for your study will be linked using established probability matching techniques based on the Howard Newcombe principles. Linkage will be undertaken by a trusted third party and a Population Spine used as an intermediary linkage tool. The Population Spine contains the personal identifiers of all individuals in Scotland who have been in contact with NHS Scotland. The steps describing the process of linking data are detailed below:

  1. Data Providers will supply personal identifiers (only) plus their own person or record ID number to the indexing team. 
  2. The indexing team will probability match the identifiers to the Population Spine using complex algorithms.
  3. The Data Provider will receive a file back with their own person or record ID number and a unique person index ID number specific to that dataset. This is generated by the indexing team.
  4. The Data Provider will attached the received index ID number to the remaining content of the dataset to be provided for linkage and send to their Research Coordinator.
  5. The Research Coordinator will confirm that the agreed data has been received and send the file to the linkage agent. The linkage agent is an automated computer programme which carries out the linkage.
  6. The linkage agent will receive 2 files all the datasets and their unique person ID numbers plus a master control file containing a master person ID and all the dataset unique Person index ID numbers.
  7. The linkage agent will then replace all the dataset unique Person ID numbers with the master Person ID number on each of the content data files. 
  8. This allows the person analysing the data to see all the records belonging to an individual across all the datasets without the need to see the personal identifiers.