Rick J. Mourits, International Institute of Social History
Over the past decades, a wide range of historical demographic datasets have become available. Scholars from the Asia, Canada, Europe, South Africa, and the US are well aware of each other’s efforts, as the SSHA and other conferences have been excellent platforms to share good practices and build towards better datasets. As a result, researchers from a have a wide array of countries now has access to high-quality datasets on individual life courses and family connections. The wider availability of international databases has been a huge gain and allows for exciting comparisons between contexts. Yet, even though most data is Findable and Accessible for researchers, comparative research has been limited by the fractured nature in which the datasets have been built. Dataset were built at different times, on different sources, by different people who spoke a variety of languages. As a result, the structure in which data is provided has been very different: file formats differ, variable names are different, and categories are unstructured. To optimize our infrastructure for comparative research, our database need to become Interoperable. Therefore, the IDS have been developed, so that each database is available in a similar data format and variable names are standardized. This made it possible to compare databases on a larger scale (see e.g. Quaranta & Sommerseth, 2018). However, to fully solve issues with Interoperability and Reusability, information from the variables themselves needs to be standardized.
Presented in Session 95. Vocabularies. Exploring Shared Names for Default Variables across Databases