Data for Research
A Text Mining service from JSTOR
DfR HomeDfR Help

Creating Datasets

The 'Create a Dataset' option on the Data for Research leads to a form that searches the content available for text mining downloads via the self-service option. The search functionality works just like the Basic Search form on JSTOR.org and shares many of the same filtering features (see our guide to searching JSTOR). For research convenience, we also include our standard option for downloading individual PDFs, when individual items are available as open access content or when you have access to content licensed by your library. PDF files are not included in dataset downloads.

STEP 1: Define your dataset

Use the filtering features to limit your dataset by content type, publication date range, and subject.

Limiting by journal title:
  1. To limit by title, it’s necessary to include the “jcode” for each title of interest in your search. A jcode is a short code that is unique to each title. For example, the jcode for American Naturalist is “amernatu.”
  2. To find jcodes for the titles, look for the “title ID” in this journal list that corresponds to the journal of interest. Please also check the date range listed for that title; many journals have undergone title changes over the course of their publication histories, and each title change has a different jcode.
  3. Once you’ve located the jcodes for your titles of interest, you can construct a query to limit the dataset by these titles. The query format is: jcode:(amernatu OR northnatu OR southnatu)

Enter the query into the main DfR search box on the results page, and select search. The search results will be limited to the jcodes in the query, and if desired, you can then select other parameters or search within the results for specific keywords.

When searching full-text using key terms, search results are displayed in order of relevance, by default. Relevance on JSTOR is a combination of many things. More unique terms in the text result in higher scores when searches contain those terms. Phrase matches are boosted higher than just keyword matches, and more recently published content is given a slight boost.

If your results include more than 25,000 documents, your processed dataset will include the first 25,000 as defined by the parameters of your search. If you require more than 25,000 documents or a type of data not available through the self-service site, please contact us at support@jstor.org.

STEP 2: Submit a self-service dataset request

  1. All dataset requests made via the self-service option require that you are logged into a personal account before submitting the request. Accounts are free and can be registered at any time.
  2. Once you are logged into your account and have defined your dataset, you can click the “Request Dataset” button to begin the request process.
  3. The pop up window will confirm the parameters of the dataset request. Indicate your acceptance of the Terms & Conditions of Use for JSTOR, which covers the use of datasets from DfR.

STEP 3: Receive dataset

When your dataset has finished processing, you will receive an email from JSTOR Support (support@jstor.org) with one or more links to download your dataset. Depending on the size of the dataset, requests may take up to two hours to process. The link(s) will expire in 60 days, so it’s important to download your datasets immediately. See the technical specifications for JSTOR datasets.

STEP 4: Share your research results

The Data for Research service is designed to provide easier access to datasets to support new types of digital scholarship and insights. We’d love to hear about the outcome of your project and receive any feedback that can help us improve the service. Please contact support@jstor.org with questions or comments.