SALURBAL - Renovation Workflow
This workflow goes through the steps needed to generate FAIR SALURBAL data.
Overview
Due to the heterogeneity in existing SALURBAL data/codebooks, the process of how each dataset is renovate will differ. The Renovation manuals pages will provide guidance for the FAIR renovation of SALURBAL data as well as provide instructions for new SALURBAL datasets. The steps are summarized in the list below.
- Step 1: Assign
var_name
to each variable - Step 2: Summarize strata information for each variable
- Step 3: Evaluate metadata linkage
- Step 4: Renovate codebooks
- Step 5: Renovate data.csv
Deliverables at each step
The deliverables at each step are displayed in the tabs below. Within each step there are tabs that contain examples (table + downloadable csv) of deliverables for three different datasets.
var_name.csv
is a table summarizing var_name called . It should contain two columns:
var_name
dataset_id
strata.csv
is a table that contains all possible strata_id for each variable. This will organize strata information ‘long’ meaning if a variable is stratified there should be multiple rows per variable. It should contain the following columns
var_name
strata_1_name
: name of the first strata. Should have no spaces and no underdashes ’_’ all text should be in Pascal case.strata_1_value
value of the first strata. Should have no spaces and no underdashes ’_’ all text should be in Pascal case.strata_2_name
name of the second strata. Should have no spaces and no underdashes ’_’ all text should be in Pascal case.strata_2_value
value of the second strata. Should have no spaces and no underdashes ’_’ all text should be in Pascal case.
linkage.csv
is a table that describes how the linkage for each of the codebook fields. Starting with this template (📥 linkage.csv), for each codebook field (row) you should write a value of ‘1’ in the column cells if any variable falls under that linkage type.
All codebook fields are linkable only by_variable for the APS dataset so we for all codebook fields we only check (fill out the cell as ‘1’) the by_var column.
Most of the codebook fields in the CNS dataset are linkable only be variable except for:
source
vary byvar_name
+iso2
for some variables but other do not for other variables; so this row has bothby_var
andby_var_iso2
filled out.
Most of the codebook fields in the SVY dataset are linkable only be variable except for:
var_def
vary byvar_name
+strata
for some variables but other do not for other variables; so this row has bothby_var
andby_var_strata
filled out.source
vary byvar_name
+iso2
for some variables but other do not for other variables; so this row has bothby_var
andby_var_iso2
filled out.
For each data set review the metadta linkage evaluation. For each unique linkage that is present in your dataset you will need to prepare the assosiated codebook. The thought process and deliverables for our three example datasets can be seen below.
Based on the APS codebook evaluation we saw that all the metadata are categorized as simple; therefor our step -4-codebooks deliverable for this dataset contains one file - codebook_simple.csv
Based on the CNS codebook evaluation we saw:
- 17 simple fields
- 2 by_country fields (source, public)
Therefore we need to prepare two codebooks for this dataset (see below).
Based on the SVY codebook evaluation we saw:
- 15 simple fields
- 2 by_country fields (source, public, censor)
- 2 by_strata (var_def, interpretation)
Therefore we need to prepare three codebooks for this dataset (see below).
For each dataset you should have a data.csv as a deliverable
aps_data.csv
cns_data.csv
svy_data.csv
Based on the SVY codebook evaluation we saw:
- 15 simple fields
- 2 by_country fields (source, public, censor)
- 2 by_strata (var_def, interpretation)
Therefore we need to prepare three codebooks for this dataset (see below).
Deliverables summary
So for the three examples we provided here are the final deliverable files.