Will You Only Harvest Some?

The Digital Library for Information Science and Technology has announced DL-Harvest, an OAI-PMH service provider that harvests and makes searchable metadata about information science materials from the following archives and repositories:

  • ALIA e-prints
  • arXiv
  • Caltech Library System Papers and Publications
  • DLIST
  • Documentation Research and Training Centre
  • DSpace at UNC SILS
  • E-LIS
  • Metadata of LIS Journals
  • OCLC Research Publications
  • OpenMED@NIC
  • WWW Conferences Archive

DL-Harvest is a much needed, innovative discipline-based search service. Big kudos to all involved.

DLIST also just announced the formation of an advisory board.

The following musings, inspired by the DL-Harvest announcement, are not intended to detract from the fine work that DLIST is doing or from the very welcome addition of DL-Harvest to their service offerings.

Discipline-focused metadata can be relatively easily harvested from OAI-PHM-compliant systems that are organized along disciplinary lines (e.g., the entire archive/repository is discipline-based or an organized subset is discipline-based). No doubt these are very rich, primary veins of discipline-specific information, but how about the smaller veins and nuggets that are hard to identify and harvest because they are in systems or subsets that focus on another discipline?

Here’s an example. An economist, who is not part of a research center or other group that might have its own archive, writes extensively about the economics of the scholarly publishing business. This individual’s papers end up in the economics department section of his or her institutional repository and in EconWPA. They are highly relevant to librarians and information scientists, but will their metadata records be harvested for use in services like DL-Harvest using OAI-PMH since they are in the wrong conceptual bins (e.g., set in the case of the IR)?

Coleman et al. point to one solution in their intriguing "Integration of Non-OAI Resources for Federated Searching in DLIST, an Eprints Repository" paper. But (lots of hand waving here), if using automatic metadata extraction was an easy and simple way to supplement conventional OAI-PMH harvesting, the bottom line question is: how good is good enough? In other words, what’s an acceptable level of accuracy for the automatic metadata extraction? (I won’t even bring up the dreaded "controlled vocabulary" notion.)

No doubt this problem falls under the 80/20 Rule, and the 20 is most likely in the low hanging fruit OAI-PMH-wise, but wouldn’t it be nice to have more fruit?

Latest posts in Disciplinary Archives

Latest posts in E-Prints

Latest posts in Institutional Repositories

Latest posts in OAI-PMH

Latest posts in Open Access

Share and Enjoy:These icons link to social bookmarking sites where readers can share and discover new web pages.
  • blinkbits
  • BlinkList
  • blogmarks
  • co.mments
  • connotea
  • del.icio.us
  • De.lirio.us
  • digg
  • Fark
  • feedmelinks
  • Furl
  • LinkaGoGo
  • Ma.gnolia
  • NewsVine
  • Netvouz
  • RawSugar
  • Reddit
  • scuttle
  • Shadows
  • Simpy
  • Smarking
  • Spurl
  • TailRank
  • Wists
  • YahooMyWeb


DigitalKoans

DigitalKoans

Digital Scholarship

Copyright © 2005-2007 by Charles W. Bailey, Jr.

Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.