SALURBAL - Renovation Workflow
This workflow goes through the steps needed to generate FAIR SALURBAL data.
Overview
Due to the heterogeneity in existing SALURBAL data/codebooks, the process of how each dataset is renovate will differ. The Renovation manuals pages will provide guidance for the FAIR renovation of SALURBAL data as well as provide instructions for new SALURBAL datasets. The steps are summarized in the list below.
- Step 1: Assign
var_nameto each variable - Step 2: Summarize strata information for each variable
- Step 3: Evaluate metadata linkage
- Step 4: Renovate codebooks
- Step 5: Renovate data.csv
Deliverables at each step
The deliverables at each step are displayed in the tabs below. Within each step there are tabs that contain examples (table + downloadable csv) of deliverables for three different datasets.
var_name.csvis a table summarizing var_name called . It should contain two columns:
var_namedataset_id
strata.csv is a table that contains all possible strata_id for each variable. This will organize strata information ‘long’ meaning if a variable is stratified there should be multiple rows per variable. It should contain the following columns
var_namestrata_1_name: name of the first strata. Should have no spaces and no underdashes ’_’ all text should be in Pascal case.strata_1_valuevalue of the first strata. Should have no spaces and no underdashes ’_’ all text should be in Pascal case.strata_2_namename of the second strata. Should have no spaces and no underdashes ’_’ all text should be in Pascal case.strata_2_valuevalue of the second strata. Should have no spaces and no underdashes ’_’ all text should be in Pascal case.
linkage.csv is a table that describes how the linkage for each of the codebook fields. Starting with this template (📥 linkage.csv), for each codebook field (row) you should write a value of ‘1’ in the column cells if any variable falls under that linkage type.
All codebook fields are linkable only by_variable for the APS dataset so we for all codebook fields we only check (fill out the cell as ‘1’) the by_var column.
Most of the codebook fields in the CNS dataset are linkable only be variable except for:
sourcevary byvar_name+iso2for some variables but other do not for other variables; so this row has bothby_varandby_var_iso2filled out.
Most of the codebook fields in the SVY dataset are linkable only be variable except for:
var_defvary byvar_name+stratafor some variables but other do not for other variables; so this row has bothby_varandby_var_stratafilled out.sourcevary byvar_name+iso2for some variables but other do not for other variables; so this row has bothby_varandby_var_iso2filled out.
For each data set review the metadta linkage evaluation. For each unique linkage that is present in your dataset you will need to prepare the assosiated codebook. The thought process and deliverables for our three example datasets can be seen below.
Based on the APS codebook evaluation we saw that all the metadata are categorized as simple; therefor our step -4-codebooks deliverable for this dataset contains one file - codebook_simple.csv
Based on the CNS codebook evaluation we saw:
- 17 simple fields
- 2 by_country fields (source, public)
Therefore we need to prepare two codebooks for this dataset (see below).
Based on the SVY codebook evaluation we saw:
- 15 simple fields
- 2 by_country fields (source, public, censor)
- 2 by_strata (var_def, interpretation)
Therefore we need to prepare three codebooks for this dataset (see below).
For each dataset you should have a data.csv as a deliverable
aps_data.csv
cns_data.csv
svy_data.csv
Based on the SVY codebook evaluation we saw:
- 15 simple fields
- 2 by_country fields (source, public, censor)
- 2 by_strata (var_def, interpretation)
Therefore we need to prepare three codebooks for this dataset (see below).
Deliverables summary
So for the three examples we provided here are the final deliverable files.