XML Harvester Configuration File Examples

To harvest a repository, you edit the XML Harvester configuration file and run the harvest. (When you apply the same steps to a non-OAI repository, it is called "crawling" rather than "harvesting.") By editing the XML Harvester configuration file, you specify parameters for the harvest. Each line in the configuration file is a name-value pair called a "trigger." To load harvested bibliographic records into the Innovative system, see Loading MARC Records From a Harvest.

Below are examples of using XML Harvester. The examples demonstrate:

  • harvesting the LCPhotographs database of the OAI-compliant Library of Congress (LOC) repository . The LCPhotographs database is one of many thematic collections available for harvest at the LOC repository.
  • harvesting the OAI-compliant Northeastern University institutional (IRis) repository . This repository is created and managed using Innovative Content Pro / IRX. The IRis database is the only database available for harvest in this repository.
  • crawling a non-OAI-compliant internal repository. XML Harvester can crawl non-OAI-compliant internal repositories to retrieve data. You must know the repository's URL . Its data must be in a format supported by XML Harvester.

Example: Harvesting the LCPhotographs Database in the Library of Congress Repository

To harvest the Library of Congress Photographs (LCPhotographs) repository:

  1. Edit the configuration file by following these steps:
    1. In the Data Exchange function, choose XML Harvester from the Select Process drop-down menu.
    2. Choose Tools | Edit Configuration File.

      The configuration file displays:



      The triggers shown in this example are for the LCPhotographs database of the Library of Congress repository.
    3. Select and edit text in the file.
    4. Save your edits.
  2. Harvest the repository by following these steps:
    1. In the Data Exchange function, choose XML Harvester from the Select Process drop-down menu.
    2. Choose Tools | Execute | Harvest XML Records.
    3. Choose Start to start the harvest.

Example: Harvesting the IRis Repository at Northeastern University

  1. Edit the configuration file by following these steps:
    1. In the Data Exchange function, choose XML Harvester from the Select Process drop-down menu.
    2. Choose Tools | Edit Configuration File.

      The configuration file displays:



      The triggers shown in this example are for the IRis repository at Northeastern University.
    3. Select and edit text in the file.
    4. Save your edits.
  2. Harvest the repository by following these steps:
    1. In the Data Exchange function, choose XML Harvester from the Select Process drop-down menu.
    2. Choose Tools | Execute | Harvest XML Records.
    3. Choose Start to start the harvest.

Example: Crawling a non-OAI Compliant Internal Repository

  1. Edit the configuration file by following these steps:
    1. In the Data Exchange function, choose XML Harvester from the Select Process drop-down menu.
    2. Choose Tools | Edit Configuration File.

      The configuration file displays:



      The triggers shown in this example are for a hypothetical internal repository.
    3. Select and edit text in the file.
    4. Select and edit text in the file.
    5. Save your edits.
  2. Crawl the repository by following these steps:
    1. In the Data Exchange function, choose XML Harvester from the Select Process drop-down menu.
    2. Choose Tools | Execute | Harvest XML Records.
    3. Choose Start to start the crawl.