F1. Globally unique and persistent Identifiers

Variables need unique ids

Within project id - var_name

The SALURBAL database is a collection of data items, each item being an individual variable. So within the scope of the project, our primary concern is that each variable has a unique identifier which we term var_name. For example the data item of the variable SALURBAL Life Expectancy is assigned a var_name of LEAEA. Within SALURBAL no other variable/data-item has this identifier.

var_name Rules

  1. var_name is a string containing only letters and numbers and does not have spaces or special characters.
  2. var_name is a variable level identifier that should not contain strata information. For example SECLABPARTM and SECLABPARTF are invalid because they indicate that the variable SECLABPART is for sex (male) and sex (female) strata; the correct var_name in this case is just SECLABPART. var_name is strictly for the variable and strata is captured in supplementary identifiers detailed in F3

Outside project id - DOI

The question of minting DOIs at the collection or the item level is quite philosophical but draw comparison to old school multi-volume encyclopedia. Do you catalog the encyclopedia as one thing or do you the individual volumes or do you catalog the individual entries within the encyclopedia?

It really comes down to when other people try to reuse your data, should they cite one particular entry or are they more likely to cite the entire collection as a whole. My intuition is that for SALURBAL, if people reuse our data we would be suggesting a SALURBAL wide project citation rather than individual variable or working groups.

From a global perspective, DOIs are a common way to uniquely and persistently identify digital assets. After we have established a certain level FAIRness we can upload our data to a FAIR data repository (ICSPR) and they will mint a DOI for our data collection. Then we can append our within project identifier var_name to the collection/SALURBAL-level DOI to allow item level identification. For example:

  • Data Asset: SALURBAL Life expectancy data
  • Within project unique identifier: LEAEA
  • SALUBAL project identifier (e.g. DOI): 0.1000/ICPSR/xyz123
  • Globally unique and persistent Identifiers: 0.1000/ICPSR/xyz123/LEAEA