Data Management Plan

Summary

This data management plan is formulated according to the Horizon Europe Data Management Plan template. Data collected within the GGP is primarily individuals responses to personal interviews, administered either face to face or online. The study collects data on between 5,000 to 10,000 respondents per country. The design of the Generations and Gender Survey is based on the Families and Fertility Survey of the UNECE and World Fertility Survey of the International Statistical Institute. Where possible the GGS is harmonized and integrated with these historical studies, as well as similar studies such as the National Study for Family Growth in the United States.

The data includes sensitive information such as health, finances and values but does not include personal identifiers such as names, addresses or contact information. Personal contact information and identifiers are only stored and processed by the National Data Collection Partner. The National Data Collection Partner is responsible for drawing the sample and contacting respondents for interview. In some countries this role is conducted by the national statistical office and in others it is conducted by a commercial fieldwork agency. National Data Collection Partners are required to sign a cooperation agreement and data agreement with the GGP that ensures they adhere to the General Data Protection Regulation and Ethical best practices.

Data is only collected from respondents who explicitly consent to participation in the survey. Keystroke level para-data is collected from each respondent in order to monitor data quality and the efficiency of the survey. The survey data is collected from respondents using Blaise 5 and will then be stored in SPSS and STATA formats using DDI Codebook 2.5. The data are approximately 500 megabytes per country and include detailed information on individuals families, relationships, attitudes and socio-economic circumstances including their personal health and income. The data is made available to social scientists for research purposes.

FAIR data

Findable

The data documentation initiative is the international standard for data documentation in the social sciences. The DDI standards are applied across datasets and are utilised by the most widely used software and technologies in the social sciences. The data will be documented and processed using a DDI Codebook approach. Blaise 5 is DDI compliant and GGP will ensure that the DDI structure of the data is intact throughout the data lifecycle.

Naming conventions and keyword search are adopted in line with standards in the social science research infrastructures. The GGP works closely with other research infrastructures to develop and apply such standards through projects such as SERISS (www.seriss.eu) and SSHOC (https://sshopencloud.eu/). The GGP uses a two level versioning approach (x.y) by which an increase in the first level (x) indicates a major change to the data which researchers should not ignore. This renders all prior versions as redundant and no longer suitable for future research. An increase in the second level (y) indicates to researchers that there have been non-critical amendments and updates to the data and that they should subsequently examine the versioning documentation to assess whether they need to use the newer version of the data.

Once a new version of the data has been processed and is ready for use, the data is stored at DANS (https://dans.knaw.nl/en), a trusted repository – but the access to the data is provided via the GGP website. DANS holds a data seal of approval, utilise DDI metadata standards and generate DOI’s for data submissions. The are no costs for archiving at DANS. The data will be findable as DANS has extensive search functions and are part of CESSDA. Access to the data is provided via the GGP website (www.ggp-i.org) with applications processed and managed by the UNECE. The UNECE provides this service as a contribution to the GGP, of which it was a founder member.

Accessible

Data containing direct personal identifiers is not included. In the questionnaire individuals are asked to name friends and relatives to help assess relationships and interconnections. These identifiers will however be deleted before release and replaced by numeric coding. The data is made available in STATA and SPSS formats and an online codebook is available via the GGP website using NESSTAR software. This online codebook contains all the information needed for analysis including all metadata.

Interoperable

The GGP applies DDI Codebook standards to its data collection and ensures interoperability through this. This is desirable given the reliance of the project on DDI technologies such as Blaise and NESSTAR. Where possible, the GGP uses established ontologies to promote interoperability including in the classification of Education Levels (ISCED), Occupations (ISCO), Languages (ISO), Countries (ISO) and several Sustainable Development Goals.

Reusable

Acces to data is restricted to verified researchers who are part of a recognized research institute under national guidelines. Researchers are asked to sign a data agreement which stipulates that the data will only be used for research purposes. The verification of researchers will be conducted by UNECE. Other than this, no restrictions apply to the use of the data and no embargo will apply. Data will be made available to researchers within 6 months of the end of fieldwork. Data quality checks will be made throughout the fieldwork process and will include the cleaning and validation of all variables within the dataset. Commercial use of the data is not permitted given assurances that are made to respondents and to ensure that their wishes are respected. The data will be usable indefinitely.

Resources for Management

The processing costs, including data documentation, are included within the budget of the pilot experiment and the staff time allocated to this task. There are no archiving costs for the data as DANS is a publicly supported repository which is financed to provide these services. The access management of the data will be supported by the UNECE and their in kind contribution of staff resources to the GGP. The UNECE provides this service for the GGP project including data that extends back to the 1990’s by maintaining the continuity in this access role, the GGP can ensure maximal returns on future data collections as it increases the value of the existing data collection.

Data Security

Sensitive data is not transferred to the GGP itself and is held securely by in-country fieldwork teams. The data stored by the GGP is sensitive and depersonalised but not anonymised. During fieldwork this data is collected and stored on a server that is owned and operated by KNAW (The Dutch Royal Academy of Science). Tom Emery and Susana Cabaco of the GGP coordination team are the only individuals with access to this data. On completion of fieldwork, data will be removed from the fieldwork server and copies of the data will be held by an international repository such as DANS as well as being stored on two separate servers at the Dutch Royal Academy of Science (The parent institute of NIDI). This will ensure that the data is always recoverable.

Ethical Aspects

All respondents in the Generations and Gender Survey are provided with extensive information on the nature of the project and how it processes data. All respondents are required to provide consent regarding their participation and to be recontacted for follow up interviews at later times. The Generations and Gender Programme never has access to personal identifiers of respondents. National Data Collection Partners are responsible for storing and maintaining the personal contact information of respondents between waves. All respondents are provided with the contact details of the National Data Collection Partners and can request their removal from the study at any point in time. If such a request is received, the personal information of the respondent is deleted and the National Data Collection Partner provides the GGP with the unique identifier of the respondent and their survey responses are also deleted from the archived dataset.