Joe Raad, Vrije Universiteit
Albert Meroño-Peñuela, Vrije Universiteit
Rick J. Mourits, International Institute of Social History
Auke Rijpma, Utrecht University
Schalk Ruben, Utrecht University
Bram Van den Hout, IISG
Kees Mandemakers, Erasmus University Rotterdam
Richard L. Zijdeman, International Institute of Social History
Mass digitisation projects provide historians and social scientists with datasets containing millions of observations on individuals and households. It is extremely valuable to link these records. However, as record linkage moves from thousands to millions of cases, efficient linkage strategies become paramount. In this paper, we present Burgerlinker - our open source tool to match historical records. The tool is 1) extremely fast and scalable, 2) designed to match complex first names, and requires no blocking (i.e. no restrictions on registration date, location, or parts of names). Moreover, the detected links contain detailed provenance metadata, can be saved in different formats (CSV and RDF are covered in the current version), and allow for family and life course reconstructions by computing the transitive closure over all detected links. We will use the Dutch civil registry to showcase our new matching tool. The birth, marriage, and death certificates from the 19th and early 20th centuries have been digitized to reconstruct families and life courses. This would entail a dataset containing 27.5 million certificates. We describe linkage strategies and software reducing the computing time on this database to several hours. We also describe a data model to structure the resulting dataset.
Presented in Session 197. Dealing with Data