📚 Digital Archive Downloader

Harvest metadata by collection from the Tohoku University Digital Archive (or any other OAI-PMH compliant site) and export it as Excel/CSV.

1Connect to a site

Pick a preset or enter a different OAI-PMH endpoint URL.

2Pick a collection

Click the collection you want to harvest. Each card shows its size.

Show unavailable

No matching collections.

3Choose detail level

If unsure, leave the recommended option selected.

4Harvest and download

After harvesting you can preview the records, aggregations and a yearly histogram. Download as Excel-compatible CSV.

⚙️ Advanced: technical notes

This tool talks directly to OAI-PMH endpoints — no HTML scraping. Verbs used: Identify / ListMetadataFormats / ListSets / ListIdentifiers / ListRecords. Pagination uses resumptionToken.

Column names follow {prefix}:{localName}, with @lang appended for elements carrying xml:lang (e.g. dc:title@ja). Multiple values for the same element are joined by newline. Columns starting with _ come from the OAI <header> (identifier, datestamp, setSpec, status).

A 200 ms pause is inserted between paginated requests to be polite to the server. CORS must allow Access-Control-Allow-Origin: *; we fetch directly from the browser.

Works with any CORS-enabled OAI-PMH endpoint. Always check the data provider's terms of use before reusing harvested metadata.