📚 Digital Archive Downloader
Harvest metadata by collection from the Tohoku University Digital Archive (or any other OAI-PMH compliant site) and export it as Excel/CSV.
1Connect to a site
Pick a preset or enter a different OAI-PMH endpoint URL.
2Pick a collection
Click the collection you want to harvest. Each card shows its size.
3Choose detail level
If unsure, leave the recommended option selected.
4Harvest and download
After harvesting you can preview the records, aggregations and a yearly histogram. Download as Excel-compatible CSV.
⚙️ Advanced: technical notes
This tool talks directly to OAI-PMH endpoints — no HTML scraping. Verbs used: Identify / ListMetadataFormats / ListSets / ListIdentifiers / ListRecords. Pagination uses resumptionToken.
Column names follow {prefix}:{localName}, with @lang appended for elements carrying xml:lang (e.g. dc:title@ja). Multiple values for the same element are joined by newline. Columns starting with _ come from the OAI <header> (identifier, datestamp, setSpec, status).
A 200 ms pause is inserted between paginated requests to be polite to the server. CORS must allow Access-Control-Allow-Origin: *; we fetch directly from the browser.