More Than a Million Syllabuses at Your Fingertips

The open floor plan, Ikea-style furnishing and bubbly atmosphere feel more like a Silicon Valley coworking space than a world class humanities center. At the Stanford Literary Lab, the desks might be littered with hefty tomes, but the laptops show strings of code. The Lit Lab is a stronghold of the Digital Humanities, a field that ties computing to humanistic research.

It also hosts a director of the Open Syllabus Project (OSP), a research project organizing data gathered from over a million syllabuses. By identifying the reading list on each document, the team can number and rank the most-taught books since the dawn of storing syllabuses online.

David McClure, OSP technical director, is a software engineer with a humanities degree. He has spent the last two years in the Lit Lab writing code for the OSP (among other projects). The project began as a research attempt to trace the way that individual academic fields transform over time. Joe Karaganis, OSP project director and vice president of The American Assembly at Columbia University, tells EdSurge, "Teaching is a great metric - a field is what the collective chooses to reproduce."

By looking at the texts that entire generations of college students read, researchers can tell the history of academia itself. On the other hand, entrepreneurs (and education giants) gain access to a database of ranked “must-reads,” an opportunity to feed software with the assignments of a million faculty members.

#DH and #EdTech, Value in the Overlap

The Digital Humanities (DH) is the hip younger sibling of the traditional humanities and a distant cousin of education technology. Digital humanists create tools like OSP, in this case to research the history of teaching and learning. Educational technologists create tools to facilitate teaching and learning. According to its website, OSP aspires to support both, creating “A platform for the development of new research, teaching and administrative tools.”

Karaganis tells EdSurge, “Universities have an extremely valuable information resource that they've collected but ignored. Now that we've proved it’s valuable, universities now have to decide how to govern the resource. In our view universities should create a commons that can be mined or explored.”

Liberal Bias

The OSP team used machine learning and natural language processing to resurrect syllabuses scattered across the web and imbue them with new value: for teachers, researchers and entrepreneurs. By measuring the number of times a text appears on different syllabuses, OSP calculates a “teaching score.” The more a text appears, the higher the score is.

Topping the list is "The Elements of Style" with Plato’s “Republic” and “The Communist Manifesto” not far behind. Each individual entry shows the top books assigned with it, data they've turned into a myriad of visualizations, including the network below. The project recounts what teachers teach—what the next generation of young college grads will read.

The metadata, which OSP plans to release to the public (date TBD), will be open to researchers and for-profit educational technology companies alike. From journal articles on literary canon development to great book recommendation software, the future looks bright for the data.

Open, you say?

McClure tells EdSurge, “Our own biases are totally in the direction of open. We have no inherent interest in anything being closed.”

At the moment the team will not publish each individual syllabuses in the open. The metadata will include texts with their relationships and teaching scores, but not the original document. McClure says, “Syllabuses are specifically interesting documents in the context of open. They’re in this weird gray zone between being public and not at all public… People put incredible amounts of work into crafting syllabuses—they put hundreds of hours into making them, refining them over time, perfecting them. Syllabuses represent a piece of faculty’s work.” As such, they’re valuable in a big-data world.

The OSP is not the first project to attempt to gather syllabuses together. The syllabus data came primarily from a project in the early 2000s by Dan Cohen while at George Mason University. He scraped the web for links to over a million syllabuses, but lacked the technologies to transform that record into a usable collection. Karaganis says, "We've taken the first steps towards solving what a lot of people have been trying to solve before. We finally had the right resources at the right time.”