The data is submitted in an already pre-harmonised form. It is prepared and organised according to the GGS standards. Thus, the next important procedure that the data will go through is harmonisation. Harmonisation aims at achieving a clear and comparable format of the GGS micro-data files that would be adequate for cross-country comparison. The harmonisation procedure basically is composed of:
- Label checks
Since it is necessary that in a comparative dataset the variables are consistent across countries, this step will make sure that: all the variables are named the same across the countries and they refer to a particular question in the GGS Questionnaire; the value labels also should be coded in the same manner for all the GGP participating countries
- Dealing with grids
The GGS questionnaire holds several grids of either event history information or members of the household. The Household grid, for example, captures the key information of the respondent and members of the respondent's household.
Such data needs to be harmonized with specific attention to order and logical conistency of grid-rows (be either household members or events such as births). In data sense each row of the grid is represnted by variable name followed by a subscripted number ("_#"). Each subscript thus represents one household member or one event. Part of the grid harmonization is therefore grid sorting. Sorting implies that each superscript refers to the particular member in the household and that the grid rows are sorted according to pre-defined key. For the case of the household grid the household members are sorted according to their relationship to the respondent i.e. the relation to respondent variable (ahg3_# or bhg3_# ). Respondents would appear, first, followed by their partners and children if any and then followed by other household members. As there may be more then one child (or other relative) living in the household they also would need to be sorted. In the case of the household grid, age is used as the secondary sorting key (starting with the oldest person to the youngest).
The routing check is a very important step of the harmonization procedure. Routing check ensures that the structure of underlying data set matches the structure of the GGS questionnaire. Its main goald is to code any given variable in the dataset to either a valid response, nonresponse or skip as indicated in the questionnaire. Consequently, the indicated skip in the quetionnaire is represented with a system missing code (. in STATA, sysmis in SPSS), while the missing information for other reasons is coded into non-applicable/no response (i.e. codes 7, 8, 9 in SPSS or .a, .b, .c in STATA). The routing check therefore examines each and every cell of the dataset and compares it to its corresponding definition in the questionnaire. Action is taken on a couple of distinct occasions. A brief decision process is represented in the diagram.
The final GGS data hamonization step is the consolidation of scattered data. The process consolidates the information scattered over several variables into a single one. Scattering of information often occurs due to simplifactions in questionnaire routing - i.e. paper and pencil questionnaire adaptations for easier interviewing. The consolidation procedure is carried out in the Children Section, the Partnership Section and the Parents and Parental Home Section.
Income is a variable with highest missingness rate. Due to its sensitive nature, the respondents are reluctant to share income information with the interviewer. In order to be able to use income information in a cross country comparative study and not loose too many observations in the process it is necessary to impute the approximately correct distribution of the income variable in each country.
For a more detailed and technical procedure please refer to the Data Cleaning and Harmonisation Guidelines.
Data Cleaning and Harmonization Guidleines (288.12 kB 2009-08-23 21:00:36)