Data for Research
A Text Mining service from JSTOR
DfR HomeDfR Help

Dataset Services

JSTOR supports new scholarship and analysis that can be produced by text mining academic content at scale. We have provided researchers with datasets for content in the digital library since the late 1990s. After several years of producing custom datasets through a manual process, we introduced the original Data for Research self-service site in 2008 to provide easier access to datasets for scholars.

The current Data for Research service provides datasets for journals, ebooks, research reports, and pamphlets in the digital library for use in research and teaching. The datasets produced through our service may include metadata, n-grams, and full-text. Anyone can request a dataset, regardless of whether they have an affiliation with a participating library. Researchers may create a dataset of up to 25,000 documents (metadata and/or n-grams) using the self-service option, or may obtain larger datasets and full-text datasets by special request. We recommend submitting special requests as early as possible in advance of the start date of your project.

Which dataset option is right for you?

table comparing self-service and full-text request options
Self-Service Large/full-text request
  • Metadata + References
  • N-grams
  • Metadata + References
  • N-grams
  • OCR full-text
  • Most archival journals
  • Most ebooks
  • Research reports
  • 19th century pamphlets
  • Most archival journals
  • Open access ebooks
  • Research reports
  • 19th century pamphlets
  • Free personal account
  • Click-through agreement on request page
  • Special request form
  • Brief agreement outlining use of the data

Delivery in less than one hour

25,000 document limit

Delivery in 3-5 weeks*

On average, drafting and review of agreement by researcher(s) and JSTOR may take 2-4 weeks, depending on complexity of request and number of requests in queue.

The delivery of the dataset from the time the signed agreement is received takes approximately one week.

* On average, drafting and review of agreement by researcher(s) and JSTOR may take 2-4 weeks, depending on complexity of request and number of requests in queue.

The delivery of the dataset from the time the signed agreement is received takes approximately one week.