FILE SYSTEM AND LABELING
Each dataset in the CFE database is provided in the form of source file containing absolute numbers of respondents by parity, tabulated further by other characteristics. The filename is constructed as COUNTRY_SURVEY YEAR.csv where:
- COUNTRY is three letter international country code (ISO alpha-3);
- SURVEY is the type of survey (usually one of Census, Survey, Register);
- YEAR is year of census or survey.
For example, the 2001 census data for Austria are included in file Austria_Census 2001.csv
The data are stored as comma-separated values (CSV) files in a long format with the header. In the statistical package R the following command can be used to import the data:
read.csv(‘Austria_Census 2001.csv’, header = TRUE)
The first line contains a header, with the following categories listed: :
country, data_source, cohort_from, cohort_to, edu_eurrep, edu_from, edu_to, sex, origin, stat, value
- country – country name;
- data_source – simple label of data source including the time of survey/census (e.g. Census 2001);
- cohort_from, cohort_to – respondents’ birth cohort range (e.g. 1946-1950); these two columns are equal when one-year birth cohorts are displayed; unknown cohorts are labelled as -1;
- edu_eurrep, edu_from, edu_to – coding of education, described in the next section;
- sex – F for women, M for men (when included);
- origin – Native for respondents born in the country or with the country’s citizenship, Foreign for respondents born in another country or with a foreign citizenship, Total when the origin is not distinguished;
- stat – name of the indicator listed in the next column, labelled value:
- women_total/men_total – total number of respondents;
- children_total – the total number of children ever born to all respondents;
- parity_0 – the number of childless respondents;
- parity_i (i=1,2…,i_max) – number of respondents with i children; the the maximum-parity category i_max differs across surveys and includes all respondents with a higher number of children i+;
- parity_unknown – number of respondents for whom the number of children is not known;
- value – number of cases.
The database does not list totals by cohort, by education, or by sex; only total numbers by place of origin when origin is not available. Total can always be computed as a sum of all specified cases.
The source data are extracted directly from the survey or census records, with very few, if any, computations.