Progress Report, February 2005
Following a meeting of the NAPP collaborators in October 2002, we began planning for a beta-release of harmonized data in 2003.
We adapted the IPUMS-International extraction software for NAPP, and tested its ability to handle the very large extracts that the NAPP would be releasing. Following testing of the software and procedures we released a coded version of the United States data set in July 2003.
NAPP participants met again in Tromsø, Norway in August 2003. We finalized our adaptations to the HISCO occupational classification scheme, and discussed the release of harmonized data from Canada, Great Britain and Norway.
Following completion of cleaning and coding, we released a harmonized version of the 1881 Canadian census data in December 2003. This was the first dataset released to include the harmonized occupational codes.
NAPP participants met in Essex in March 2004. At this meeting we finalized the list of constructed variables that will be included in the final data release in summer 2005, and outlined the complete documentation that will be available for the datasets.
In November 2004, we released harmonized versions of the 1881 census of England and Wales, and the 1900 census of Norway. These censuses took somewhat longer to process than anticipated because of differences in the enumeration of households in Norway. Revised versions of the United States and Canadian data were released in December 2004. These revisions fixed missing data in some variables, and added five new variables for the United States.
We are currently planning to release data from the 1881 census of Scotland, which has largely the same format as the census of England and Wales in March 2005. The only difference in the two enumerations is the geographic organization below the level of registration county.
The three censuses of Iceland that we plan to release have required somewhat more work than we anticipated. Some of the data was missing variables, even though it had been transcribed for all people. Other censuses had all variables transcribed for a subset of the population. Data entry of the missing cases and variables has been undertaken by students from the University of Iceland in the summers. We expect to release this data in summer 2005. In total, the three censuses of Iceland that we will release include 270,000 people. Because of the small size of the datasets, we expect that processing time will be relatively quick once data entry is complete.
The final NAPP data release is scheduled for summer 2005. In the spring of 2005 we are working on the following enhancements to the data and documentation:
- Completion of coding the United States and British occupational data into the adapted HISCO coding scheme.
- Adding low-level geographic information to the United States data set.
- Constructing additional variables describing household and family relationships for all datasets.
- Substantial expansion of the documentation of;
- Cross-country differences in enumeration procedures
- Procedural history of the creation and processing of the datasets
- Occupational classification and coding
- Constructed family and household relationship variables
Users who are familiar with the IPUMS and IPUMS-International constructed variables and documentation will find these variables replicated in the NAPP datasets in the final release.
Users will find that the documentation on the site will change over the course of spring 2005 as we prepare for our final release. All documentation should be regarded as preliminary until the completion of the project in summer 2005.