Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH)
Some metadata exports on this page do not work as we are transitioning to a new API, which eventually will offer the same formats as before (DDI-Codebook, Dublin Core, MARC), as well as add a new format—DCAT. We apologize for the temporary inconvenience.
NOTE: By downloading ICPSR metadata records, you agree to ICPSR's Conditions of Use regarding those records.
ICPSR provides study-level metadata via OAI-PMH. To use this service:
- Request your IT staff install an OAI harvester.
- In the harvester software, enter the base URL and metadataPrefix for the format you wish to download. (See below.)
- Run the software.
Some harvesters operate at the unix/linux command line; some operate using simple web interfaces. ICPSR tested its OAI-PMH implementation using jOAI, which uses a web interface.
For more information on the standard, please visit the The OAI-PMH website. OpenArchives.org also maintains a page on OAI-PMH Tools, which lists a few OAI harvesters.
Technical Details
OAI-PMH is a base URL for either study metadata or related citation metadata:
https://pcms.icpsr.umich.edu/pcms/api/1.0/oai/studies
https://pcms.icpsr.umich.edu/pcms/api/1.0/oai/citations
with 1-3 variables tacked onto the end of the URL. The three variables are:
- metadataPrefix (format)
- verb
- identifier
So a URL to retrieve the study metadata record for ICPSR 6849 in Dublin Core format would look like this:
Variables are added to the end of a URL after a question mark. Individual variables are constructed as fieldname=value and are separated by ampersands.
metadataPrefix
The metadataPrefix variable spells out the format of the output. ICPSR supports the following prefixes:
studies
- Dublin Core -
metadataPrefix=oai_dc
- DDI 2.5 -
metadataPrefix=oai_ddi25
DDI 2.5 with Citations -metadataPrefix=oai_ddi25_citations
MARC21XML -metadataPrefix=oai_marc
citations
Scholix -metadataPrefix=oai_scholix
Please note the Scholix feed returns only links to publications that have identifiers (DOI, URL, PMCID).
If you would like ICPSR to provide additional formats/objects, please contact us at ICPSR-help@umich.edu.
verb
The verb variable spells out what kind of result you want to obtain. Not all OAI-PMH verbs are useful for our particular implementation of OAI-PMH; the useful verbs are:
- ListRecords - Retrieves 50 records at a time. ICPSR has over 9000 studies, so we use something called a resumptionToken, which will enable scripts to retrieve the entire collection in 50-record increments.
- GetRecord - Returns an individual metadata record; requires an identifier
In addition, there are other OAI-PMH verbs that we don't fully utilize:
- Identify - Provides a little information on the OAI-PMH service and repository.
- ListSets - Not used by ICPSR.
- ListIdentifiers - Returns a list of ICPSR identifiers (and the release date for each). Since ICPSR identifiers are just the study numbers, this isn't typically useful.
- ListMetadataFormats - Lists the available metadata formats for a given record; requires an identifier. As ICPSR currently only supports Dublin Core, this verb is mostly useless.
identifier
The identifier variable enables you to spell out which object you wish to retrieve, in this case a study. ICPSR identifiers are just the study number. You can use either the 5-digit study number, or the study number without padding. I.e., both 6849 and 06849 will work.
Our citation identifiers are strictly internal, so it's unlikely you'll use them to perform a GetRecord.
Problems
ICPSR can provide some support for OAI-PMH if our server is not responding or the retrieved metadata is not valid. We can also add metadata formats if there is sufficient demand. We cannot provide support for installing or implementing OAI harvesters at your institution.
If you have questions, email us at ICPSR-help@umich.edu.
Testing
The OAI-PMH feed was tested on:
2020-07-16 - Used OAIHarvester2 to download MARC21XML metadata for 10K+ records in under 10 minutes; the leader issue has been resolved and the XML validates.
2020-04-24 - Used OAIHarvester2 to download MARC21XML metadata for 10K+ records in just over four minutes. Discovered issue with leader element; to be resolved by end of May.
2019-12-19 - Used OAIHarvester2 to download Dublin Core metadata for 10K+ records in just over one minute.