SALURBAL - Renovation Workflow

This workflow goes through the steps needed to generate FAIR SALURBAL data.

Overview

Due to the heterogeneity in existing SALURBAL data/codebooks, the process of how each dataset is renovate will differ. The Renovation manuals pages will provide guidance for the FAIR renovation of SALURBAL data as well as provide instructions for new SALURBAL datasets. The steps are summarized in the list below.

Step 1: Assign var_name to each variable
Step 2: Summarize strata information for each variable
Step 3: Evaluate metadata linkage
Step 4: Renovate codebooks
Step 5: Renovate data.csv

Deliverables at each step

The deliverables at each step are displayed in the tabs below. Within each step there are tabs that contain examples (table + downloadable csv) of deliverables for three different datasets.

var_name.csvis a table summarizing var_name called . It should contain two columns:

var_name
dataset_id

strata.csv is a table that contains all possible strata_id for each variable. This will organize strata information ‘long’ meaning if a variable is stratified there should be multiple rows per variable. It should contain the following columns

var_name
strata_1_name: name of the first strata. Should have no spaces and no underdashes ’_’ all text should be in Pascal case.
strata_1_value value of the first strata. Should have no spaces and no underdashes ’_’ all text should be in Pascal case.
strata_2_name name of the second strata. Should have no spaces and no underdashes ’_’ all text should be in Pascal case.
strata_2_value value of the second strata. Should have no spaces and no underdashes ’_’ all text should be in Pascal case.

linkage.csv is a table that describes how the linkage for each of the codebook fields. Starting with this template (📥 linkage.csv), for each codebook field (row) you should write a value of ‘1’ in the column cells if any variable falls under that linkage type.

All codebook fields are linkable only by_variable for the APS dataset so we for all codebook fields we only check (fill out the cell as ‘1’) the by_var column.

Most of the codebook fields in the CNS dataset are linkable only be variable except for:

source vary by var_name+iso2 for some variables but other do not for other variables; so this row has both by_var and by_var_iso2 filled out.

Most of the codebook fields in the SVY dataset are linkable only be variable except for:

var_def vary by var_name+strata for some variables but other do not for other variables; so this row has both by_var and by_var_strata filled out.
source vary by var_name+iso2 for some variables but other do not for other variables; so this row has both by_var and by_var_iso2 filled out.

For each data set review the metadta linkage evaluation. For each unique linkage that is present in your dataset you will need to prepare the assosiated codebook. The thought process and deliverables for our three example datasets can be seen below.

Based on the APS codebook evaluation we saw that all the metadata are categorized as simple; therefor our step -4-codebooks deliverable for this dataset contains one file - codebook_simple.csv

codebook_simple.csv

Based on the CNS codebook evaluation we saw:

17 simple fields
2 by_country fields (source, public)

Therefore we need to prepare two codebooks for this dataset (see below).

codebook_simple.csv
codebook_by_iso2.csv

Based on the SVY codebook evaluation we saw:

15 simple fields
2 by_country fields (source, public, censor)
2 by_strata (var_def, interpretation)

Therefore we need to prepare three codebooks for this dataset (see below).

For each dataset you should have a data.csv as a deliverable

aps_data.csv

cns_data.csv

svy_data.csv

Based on the SVY codebook evaluation we saw:

15 simple fields
2 by_country fields (source, public, censor)
2 by_strata (var_def, interpretation)

Therefore we need to prepare three codebooks for this dataset (see below).

Deliverables summary

So for the three examples we provided here are the final deliverable files.