Will You Only Harvest Some?
The Digital Library for Information Science and Technology has announced DL-Harvest, an OAI-PMH service provider that harvests and makes searchable metadata about information science materials from the following archives and repositories:
- ALIA e-prints
- arXiv
- Caltech Library System Papers and Publications
- DLIST
- Documentation Research and Training Centre
- DSpace at UNC SILS
- E-LIS
- Metadata of LIS Journals
- OCLC Research Publications
- OpenMED@NIC
- WWW Conferences Archive
DL-Harvest is a much needed, innovative discipline-based search service. Big kudos to all involved.
DLIST also just announced the formation of an advisory board.
The following musings, inspired by the DL-Harvest announcement, are not intended to detract from the fine work that DLIST is doing or from the very welcome addition of DL-Harvest to their service offerings.
Discipline-focused metadata can be relatively easily harvested from OAI-PHM-compliant systems that are organized along disciplinary lines (e.g., the entire archive/repository is discipline-based or an organized subset is discipline-based). No doubt these are very rich, primary veins of discipline-specific information, but how about the smaller veins and nuggets that are hard to identify and harvest because they are in systems or subsets that focus on another discipline?
Here’s an example. An economist, who is not part of a research center or other group that might have its own archive, writes extensively about the economics of the scholarly publishing business. This individual’s papers end up in the economics department section of his or her institutional repository and in EconWPA. They are highly relevant to librarians and information scientists, but will their metadata records be harvested for use in services like DL-Harvest using OAI-PMH since they are in the wrong conceptual bins (e.g., set in the case of the IR)?
Coleman et al. point to one solution in their intriguing "Integration of Non-OAI Resources for Federated Searching in DLIST, an Eprints Repository" paper. But (lots of hand waving here), if using automatic metadata extraction was an easy and simple way to supplement conventional OAI-PMH harvesting, the bottom line question is: how good is good enough? In other words, what’s an acceptable level of accuracy for the automatic metadata extraction? (I won’t even bring up the dreaded "controlled vocabulary" notion.)
No doubt this problem falls under the 80/20 Rule, and the 20 is most likely in the low hanging fruit OAI-PMH-wise, but wouldn’t it be nice to have more fruit?
Latest posts in Disciplinary Archives
- Repository Interface for Overlaid Journal Archives: Results from an Online Questionnaire Survey - April 10th, 2008
- 67 Plagiarized Papers from Turkey Removed from arXiv - September 6th, 2007
- AONS: Scanning Repositories for Obsolete Digital Formats - September 3rd, 2007
Latest posts in E-Prints
- Repository Interface for Overlaid Journal Archives: Results from an Online Questionnaire Survey - April 10th, 2008
- College & Research Libraries Makes Preprints Available, but Restricts Access - March 22nd, 2008
- Microsoft Developing Authoring Add-in for Microsoft Office Word 2007 with NLM DTD Support - March 22nd, 2008
Latest posts in Institutional Repositories
- "Institutional Repository Checklist for Serving Institutional Management" - April 17th, 2008
- Repositories Support Project Releases Briefing Papers: Open Archives Initiative-Protocol for Metadata Harvesting and Workflows - April 15th, 2008
- Updated Alpha Version of ORE Specification and User Guide Released - April 14th, 2008
Latest posts in OAI-PMH
- Repositories Support Project Releases Briefing Papers: Open Archives Initiative-Protocol for Metadata Harvesting and Workflows - April 15th, 2008
- Digital Library Federation and 10 Vendors/Developers Reach Accord about ILS Basic Discovery Interfaces - April 9th, 2008
- OAI4J: OAI-PMH/OAI-ORE Software - March 20th, 2008
Latest posts in Open Access
- Open Access Directory, a Factual Wiki, Launched - April 30th, 2008
- Report Released: Strategies for Open and Permanent Access to Scientific Information in Latin America - April 22nd, 2008
- Interview with Microsoft's Pablo Fernicola about Article Authoring Add-in for Microsoft Office Word 2007 - April 21st, 2008





























