Archive for the 'E-Prints' Category

OAIster Hits 10,000,000 Records

Posted in E-Prints, OAI-PMH, Open Access, Scholarly Communication, Search Engines on January 25th, 2007

Excerpt from the press release:

We live in an information-driven world—one in which access to good information defines success. OAIster’s growth to 10 million records takes us one step closer to that goal.

Developed at the University of Michigan’s Library, OAIster is a collection of digital scholarly resources. OAIster is also a service that continually gathers these digital resources to remain complete and fresh. As global digital repositories grow, so do OAIster’s holdings.

Popular search engines don’t have the holdings OAIster does. They crawl web pages and index the words on those pages. It’s an outstanding technique for fast, broad information from public websites. But scholarly information, the kind researchers use to enrich their work, is generally hidden from these search engines.

OAIster retrieves these otherwise elusive resources by tapping directly into the collections of a variety of institutions using harvesting technology based on the Open Archives Initiative (OAI) Protocol for Metadata Harvesting. These can be images, academic papers, movies and audio files, technical reports, books, as well as preprints (unpublished works that have not yet been peer reviewed). By aggregating these resources, OAIster makes it possible to search across all of them and return the results of a thorough investigation of complete, up-to-date resources. . . .

OAIster is good news for the digital archives that contribute material to open-access repositories. "[OAIster has demonstrated that]. . . OAI interoperability can scale. This is good news for the technology, since the proliferation is bound to continue and even accelerate," says Peter Suber, author of the SPARC Open Access Newsletter. As open-access repositories proliferate, they will be supported by a single, well-managed, comprehensive, and useful tool.

Scholars will find that searching in OAIster can provide better results than searching in web search engines. Roy Tennant, User Services Architect at the California Digital Library, offers an example: "In OAIster I searched ‘roma’ and ‘world war,’ then sorted by weighted relevance. The first hit nailed my topic—the persecution of the Roma in World War II. Trying ‘roma world war’ in Google fails miserably because Google apparently searches ‘Rome’ as well as ‘Roma.’ The ranking then makes anything about the Roma people drop significantly, and there is nothing in the first few screens of results that includes the word in the title, unlike the OAIster hit."

OAIster currently harvests 730 repositories from 49 countries on 6 continents. In three years, it has more than quadrupled in size and increased from 6.2 million to 10 million in the past year. OAIster is a project of the University of Michigan Digital Library Production Service.

Share and Enjoy:
  • blinkbits
  • BlinkList
  • blogmarks
  • co.mments
  • connotea
  • del.icio.us
  • De.lirio.us
  • digg
  • Fark
  • feedmelinks
  • Furl
  • LinkaGoGo
  • Ma.gnolia
  • NewsVine
  • Netvouz
  • RawSugar
  • Reddit
  • description
  • Shadows
  • Simpy
  • Smarking
  • Spurl
  • TailRank
  • Wists
  • YahooMyWeb

Comments closed here. Read and add comments at
http://www.digital-scholarship.org/digitalkoans/.

ScientificCommons.org: Access to Over 13 Million Digital Documents

Posted in E-Prints, OAI-PMH, Open Access, Scholarly Communication, Search Engines on January 19th, 2007

ScientificCommons.org is an initiative of the Institute for Media and Communications Management at the University of St. Gallen. It indexes both metadata and full-text from global digital repositories. It uses OAI-PMH to identify relevant documents. The full-text documents are in PDF, PowerPoint, RTF, Microsoft Word, and Postscript formats. After being retrieved from their original repository, the documents are cached locally at ScientificCommons.org. It has indexed about 13 million documents from over 800 repositories.

Here are some additional features from the About ScientificCommons.org page:

Identification of authors across institutions and archives: ScientificCommons.org identifies authors and assigns them their scientific publications across various archives. Additionally the social relations between the authors will be extracted and displayed. . . .

Semantic combination of scientific information: ScientificCommons.org structures and combines the scientific data to knowledge areas with Ontology’s. Lexical and statistical methods are used to identify, extract and analyze keywords. Based on this processes ScientificCommons.org classifies the scientific data and uses it e.g. for navigational and weighting purposes.

Personalization services: ScientificCommons.org offers the researchers the possibilities to inform themselves about new publications via our RSS Feed service. They can customize the RSS Feed to a special discipline or even to personalized list of keywords. Furthermore ScientificCommons.org will provide an upload service. Every researcher can upload his publication directly to ScientificCommons.org and assign already existing publications at ScientificCommons.org to his own researcher profile.

Share and Enjoy:
  • blinkbits
  • BlinkList
  • blogmarks
  • co.mments
  • connotea
  • del.icio.us
  • De.lirio.us
  • digg
  • Fark
  • feedmelinks
  • Furl
  • LinkaGoGo
  • Ma.gnolia
  • NewsVine
  • Netvouz
  • RawSugar
  • Reddit
  • description
  • Shadows
  • Simpy
  • Smarking
  • Spurl
  • TailRank
  • Wists
  • YahooMyWeb

Comments closed here. Read and add comments at
http://www.digital-scholarship.org/digitalkoans/.

Notre Dame Institutional Digital Repository Phase I Final Report

Posted in DSpace, E-Prints, Electronic Theses and Dissertations, Institutional Repositories, Open Access, Scholarly Communication on January 16th, 2007

The University of Notre Dame Libraries have issued a report about their year-long institutional repository pilot project. There is an abbreviated HTML version and a complete PDF version.

From the Executive Summary:

Here is the briefest of summaries regarding what we did, what we learned, and where we think future directions should go:

  1. What we did—In a nutshell we established relationships with a number of content groups across campus: the Kellogg Institute, the Institute for Latino Studies, Art History, Electrical Engineering, Computer Science, Life Science, the Nanovic Institute, the Kaneb Center, the School of Architecture, FTT (Film, Television, and Theater), the Gigot Center for Entrepreneurial Studies, the Institute for Scholarship in the Liberal Arts, the Graduate School, the University Intellectual Property Committee, the Provost’s Office, and General Counsel. Next, we collected content from many of these groups, "cataloged" it, and saved it into three different computer systems: DigiTool, ETD-db, and DSpace. Finally, we aggregated this content into a centralized cache to provide enhanced browsing, searching, and syndication services against the content.
  2. What we learned—We essentially learned four things: 1) metadata matters, 2) preservation now, not later, 3) the IDR requires dedicated people with specific skills, 4) copyright raises the largest number of questions regarding the fulfillment of the goals of the IDR.
  3. Where we are leaning in regards to recommendations—The recommendations take the form of a "Chinese menu" of options, and the options are be grouped into "meals." We recommend the IDR continue and include: 1) continuing to do the Electronic Theses & Dissertations, 2) writing and implementing metadata and preservation policies and procedures, 3) taking the Excellent Undergraduate Research to the next level, and 4) continuing to implement DigiTool. There are quite a number of other options, but they may be deemed too expensive to implement.
Share and Enjoy:
  • blinkbits
  • BlinkList
  • blogmarks
  • co.mments
  • connotea
  • del.icio.us
  • De.lirio.us
  • digg
  • Fark
  • feedmelinks
  • Furl
  • LinkaGoGo
  • Ma.gnolia
  • NewsVine
  • Netvouz
  • RawSugar
  • Reddit
  • description
  • Shadows
  • Simpy
  • Smarking
  • Spurl
  • TailRank
  • Wists
  • YahooMyWeb

Comments closed here. Read and add comments at
http://www.digital-scholarship.org/digitalkoans/.

Will Self-Archiving Cause Libraries to Cancel Journal Subscriptions?

Posted in E-Journals, E-Prints, Institutional Repositories, Libraries, Open Access, Publishing, Scholarly Communication on December 21st, 2006

There has been a great deal of discussion of late about the impact of self-archiving on library journal subscriptions. Obviously, this is of great interest to journal publishers who do not want to wake up one morning, rub the sleep from their eyes, and find out over their first cup of coffee at work that libraries have en masse canceled subscriptions because a "tipping point" has been reached. Likewise, open access advocates do not want journal publishers to panic at the prospect of cancellations and try to turn back the clock on liberal self-archiving policies. So, this is not a scenario that any one wants, except those who would like to simply scrap the existing journal publishing system and start over with a digital tabula rosa.

So, deep breath: Is the end near?

This question hinges on another: Will libraries accept any substitute for a journal that does not provide access to the full, edited, and peer-reviewed contents of that journal?

If the answer is "yes," publishers better get out their survival kits and hunker down for the digital nuclear winter or else change business practices to embrace the new reality. Attempts to fight back by rolling back the clock may just make the situation worse: the genie is out of the bottle.

If the answer is "no," preprints pose no threat, but postprints may under some difficult to attain circumstances.

It is unlikely that a critical mass of author created postprints (i.e., author makes the preprint look like the postprint) will ever emerge. Authors would have to be extremely motivated to have this occur. If you don’t believe me, take a Word file that you submitted to a publisher and make it look exactly like the published article (don’t forget the pagination because that might be a sticking point for libraries). That leaves publisher postprints (generally PDF files).

For the worst to happen, every author of every paper published in a journal would have to self-archive the final publisher PDF file (or the publishers themselves would have to do it for the authors under mandates).

But would that be enough? Wouldn’t the permanence and stability of the digital repositories housing these postprints be of significant concern to libraries? If such repositories could not be trusted, then libraries would have to attempt to archive the postprints in question themselves; however, since postprints are not by default under copyright terms that would allow this to happen (e.g., they are not under Creative Commons Licenses), libraries may be barred from doing so. There are other issues as well: journal and issue browsing capabilities, the value-added services of indexing and abstracting services, and so on. For now, let’s wave our hands briskly and say that these are all tractable issues.

If the above problems were overcome, a significant one remains: publishers add value in many ways to scholarly articles. Would libraries let the existing system of journal publishing collapse because of self-archiving without a viable substitute for these value-added functions being in place?

There have been proposals for and experiments with overlay journals for some time, as well other ideas for new quality control strategies, but, to date, none have caught fire. Old-fashioned peer review, copy editing and fact checking, and publisher-based journal design and production still reign, even among the vast majority of e-journals that are not published by conventional publishers. In the Internet age, nothing technological stops tens of thousands of new e-journals using open source journal management software from blooming, but they haven’t so far, have they? Rather, if you use a liberal definition of open access, there are about 2,500 OA journals—a significant achievement; however, there are questions about the longevity of such journals if they are published by small non-conventional publishers such as groups of scholars (e.g., see "Free Electronic Refereed Journals: Getting Past the Arc of Enthusiasm"). Let’s face it—producing a journal is a lot of work, even a small journal that only publishes less than a hundred papers a year.

Bottom line: a perfect storm is not impossible, but it is unlikely.

Share and Enjoy:
  • blinkbits
  • BlinkList
  • blogmarks
  • co.mments
  • connotea
  • del.icio.us
  • De.lirio.us
  • digg
  • Fark
  • feedmelinks
  • Furl
  • LinkaGoGo
  • Ma.gnolia
  • NewsVine
  • Netvouz
  • RawSugar
  • Reddit
  • description
  • Shadows
  • Simpy
  • Smarking
  • Spurl
  • TailRank
  • Wists
  • YahooMyWeb

Comments closed here. Read and add comments at
http://www.digital-scholarship.org/digitalkoans/.

Results from the DSpace Community Survey

Posted in DSpace, E-Prints, Institutional Repositories, Open Access, Scholarly Communication on November 14th, 2006

DSpace conducted an informal survey of its open source community in October 2006. Here are some highlights:

  • The vast majority of respondents (77.6%) used or planned to use DSpace for a university IR.
  • The majority of systems were in production (53.4%); pilot testing was second (35.3%).
  • Preservation and interoperability were the highest priority system features (61.2% each), followed by search engine indexing (57.8%) and open access to refereed articles (56.9%). (Percentage of respondents who rated these features "very important.") Only 5.2% thought that OA to refereed articles was unimportant.
  • The most common type of current IR content was refereed scholarly articles and theses/dissertations (55.2% each), followed by other (48.6%) and grey literature (47.4%).
  • The most popular types of content that respondents were planning to add to their IRs were datasets (53.4%), followed by audio and video (46.6% each).
  • The most frequently used type of metadata was customized Dublin Core (80.2%), followed by XML metadata (13.8%).
  • The most common update pattern was to regularly migrate to new versions; however it took a "long time to merge in my customizations/configuration" (44.8%).
  • The most common types of modification were minor cosmetics (34.5%), new features (26.7%), and significant user interface customization (21.6%).
  • Only 30.2% were totally comfortable with editing/customizing DSpace; 56.9% were somewhat comfortable and 12.9% were not comfortable.
  • Plug-in use is light: for example, 11.2% use SRW/U, 8.6% use Manakin, and 5.2% use TAPIR (ETDs).
  • The most desired feature for the next version is a more easily customized user interface (17.5%), closely followed by improved modularity (16.7%).

For information about other recent institutional repository surveys, see "ARL Institutional Repositories SPEC Kit" and "MIRACLE Project’s Institutional Repository Survey."

Share and Enjoy:
  • blinkbits
  • BlinkList
  • blogmarks
  • co.mments
  • connotea
  • del.icio.us
  • De.lirio.us
  • digg
  • Fark
  • feedmelinks
  • Furl
  • LinkaGoGo
  • Ma.gnolia
  • NewsVine
  • Netvouz
  • RawSugar
  • Reddit
  • description
  • Shadows
  • Simpy
  • Smarking
  • Spurl
  • TailRank
  • Wists
  • YahooMyWeb

Comments closed here. Read and add comments at
http://www.digital-scholarship.org/digitalkoans/.

MIRACLE Project’s Institutional Repository Survey

Posted in DSpace, E-Prints, EPrints, Fedora, Institutional Repositories, Open Access, Scholarly Communication on September 14th, 2006

The MIRACLE (Making Institutional Repositories A Collaborative Learning Environment) project at the University of Michigan’s School of Information presented a paper at JCDL 2006 titled "Nationwide Census of Institutional Repositories: Preliminary Findings."

MIRACLE’s sample population was 2,147 library directors at four-year US colleges and universities. The paper presents preliminary findings from 273 respondents.

Respondents characterized their IR activities as: "(1) implementation of an IR (IMP), (2) planning & pilot testing an IR software package (PPT), (3) planning only (PO), or (4) no planning to date (NP)."

Of the 273 respondents, "28 (10%) have characterized their IR involvement as IMP, 42 (15%) as PPT, 65 (24%) as PO, and 138 (51%) as NP."

The top-ranked benefits of having an IR were: "capturing the intellectual capital of your institution," "better service to contributors," and "longtime preservation of your institution’s digital output." The bottom-ranked benefits were "reducing user dependence on your library’s print collection," "providing maximal access to the results of publicly funded research," and "an increase in citation counts to your institution’s intellectual output."

On the question of IR staffing, the survey found:

Generally, PPT and PO decision-makers envision the library sharing operational responsibility for an IR. Decision-makers from institutions with full-fledged operational IRs choose responses that show library staff bearing the burden of responsibility for the IR.

Of those with operational IRs who identified their IR software, the survey found that they were using: "(1) 9 for Dspace, (2) 5 for bePress, (3) 4 for ProQuest’s Digital Commons, (4) 2 for local solutions, and (5) 1 each for Ex Libris’ DigiTools and Virginia Tech’s ETD." Of those who were pilot testing software: "(1) 17 for DSpace, (2) 9 for OCLC’s ContentDM, (3) 5 for Fedora, (4) 3 each for bePress, DigiTool, ePrints, and Greenstone, (5) 2 each for Innovative Interfaces, Luna, and ETD, and (6) 1 each for Digital Commons, Encompass, a local solution, and Opus."

In terms of number of documents in the IRs, by far the largest percentages were for less than 501 documents (IMP, 41%; and PPT, 67%).

The preliminary results also cover other topics, such as content recruitment, investigative decision-making activities, IR costs, and IR system features.

It is interesting to see how these preliminary results compare to those of the ARL Institutional Repositories SPEC Kit. For example, when asked "What are the top three benefits you feel your IR provides?," the ARL survey respondents said:

  1. Enhance visibility and increase dissemination of institution’s scholarship: 68%
  2. Free, open, timely access to scholarship: 46%
  3. Preservation of and long-term access to institution’s scholarship: 36%
  4. Preservation and stewardship of digital content: 36%
  5. Collecting, organizing assets in a central location: 24%
  6. Educate faculty about copyright, open access, scholarly communication: 8%
Share and Enjoy:
  • blinkbits
  • BlinkList
  • blogmarks
  • co.mments
  • connotea
  • del.icio.us
  • De.lirio.us
  • digg
  • Fark
  • feedmelinks
  • Furl
  • LinkaGoGo
  • Ma.gnolia
  • NewsVine
  • Netvouz
  • RawSugar
  • Reddit
  • description
  • Shadows
  • Simpy
  • Smarking
  • Spurl
  • TailRank
  • Wists
  • YahooMyWeb

Comments closed here. Read and add comments at
http://www.digital-scholarship.org/digitalkoans/.

ARL Institutional Repositories SPEC Kit

Posted in ARL Libraries, Announcements, DSpace, E-Prints, Institutional Repositories, Open Access, Scholarly Communication on August 21st, 2006

The Institutional Repositories SPEC Kit is now available from the Association of Research Libraries (ARL). This document presents the results of a thirty-eight-question survey of 123 ARL members in early 2006 about their institutional repositories practices and plans. The survey response rate was 71% (87 out of 123 ARL members responded). The front matter and nine-page Executive Summary are freely available. The document also presents detailed question-by-question results, a list of respondent institutions, representative documents from institutions, and a bibliography. It is 176 pages long.

Here is the bibliographic information: University of Houston Libraries Institutional Repository Task Force. Institutional Repositories. SPEC Kit 292. Washington, DC: Association of Research Libraries, 2006. ISBN: 1-59407-708-8.

The members of the University of Houston Libraries Institutional Repository Task Force who authored the document were Charles W. Bailey, Jr. (Chair); Karen Coombs; Jill Emery (now at UT Austin); Anne Mitchell; Chris Morris; Spencer Simons; and Robert Wright.

The creation of a SPEC Kit is a highly collaborative process. SPEC Kit Editor Lee Anne George and other ARL staff worked with the authors to refine the survey questions, mounted the Web survey, analyzed the data in SPSS, created a preliminary summary of survey question responses, and edited and formatted the final document. Given the amount of data that the survey generated, this was no small task. The authors would like to thank the ARL team for their hard work on the SPEC Kit.

Although the Executive Summary is much longer than the typical one (over 5,100 words vs. about 1,500 words), it should not be mistaken for a highly analytic research article. Its goal was to try to describe the survey’s main findings, which was quite challenging given the amount of survey data available. The full data is available in the "Survey Questions and Responses" section of the SPEC Kit.

Here are some quick survey results:

  • Thirty-seven ARL institutions (43% of respondents) had an operational IR (we called these respondents implementers), 31 (35%) were planning one by 2007, and 19 (22%) had no IR plans.
  • Looked at from the perspective of all 123 ARL members, 30% had an operational IR and, by 2007, that figure may reach 55%.
  • The mean cost of IR implementation was $182,550.
  • The mean annual IR operation cost was $113,543.
  • Most implementers did not have a dedicated budget for either start-up costs (56%) or ongoing operations (52%).
  • The vast majority of implementers identified first-level IR support units that had a library reporting line vs. one that had a campus IT or other campus unit reporting line.
  • DSpace was by far the most commonly used system: 20 implementers used it exclusively and 3 used it in combination with other systems.
  • Proquest DigitalCommons (or the Bepress software it is based on) was the second choice of implementers: 7 implementers used this system.
  • While 28% of implementers have made no IR software modifications to enhance its functionality, 22% have made frequent changes to do so and 17% have made major modifications to the software.
  • Only 41% of implementers had no review of deposited documents. While review by designated departmental or unit officials was the most common method (35%), IR staff reviewed documents 21% of the time.
  • In a check all that apply question, 60% of implementers said that IR staff entered simple metadata for authorized users and 57% said that they enhanced such data. Thirty-one percent said that they cataloged IR materials completely using local standards.
  • In another check all that apply question, implementers clearly indicated that IR and library staff use a variety of strategies to recruit content: 83% made presentations to faculty and others, 78% identified and encouraged likely depositors, 78% had library subject specialists act as advocates, 64% offered to deposit materials for authors, and 50% offered to digitize materials and deposit them.
  • The most common digital preservation arrangement for implementers (47%) was to accept any file type, but only preserve specified file types using data migration and other techniques. The next most common arrangement (26%) was to accept and preserve any file type.
  • The mean number of digital objects in implementers’ IRs was 3,844.
Share and Enjoy:
  • blinkbits
  • BlinkList
  • blogmarks
  • co.mments
  • connotea
  • del.icio.us
  • De.lirio.us
  • digg
  • Fark
  • feedmelinks
  • Furl
  • LinkaGoGo
  • Ma.gnolia
  • NewsVine
  • Netvouz
  • RawSugar
  • Reddit
  • description
  • Shadows
  • Simpy
  • Smarking
  • Spurl
  • TailRank
  • Wists
  • YahooMyWeb

Comments closed here. Read and add comments at
http://www.digital-scholarship.org/digitalkoans/.