Computational Problems in the Digital Humanities

Course Calendar

As the curtain rises on this course, several class meetings remain unstructured. This (hopefully) will allow us to take advantage of a few different kinds of opportunities: topics identified during our discussion, guest speakers, and topics that simply need more time for discussion.

Here's a gcal view of the course, in case it's helpful:


Class Topic Activities
Class 1
Aug 27
Introductions, Two Cultures

Welcome to class.... Today we'll introduce ourselves and and talk about where this class should take us. Be prepared to talk a bit about the research questions you're asking these days, and how they might relate to this class. (Research questions like, "What should I be researching?" are perfectly admissible.)

Also, we'll talk about situating ourselves and this class in the context of C.P. Snow's famous "two cultures" lecture. To that end, please read the essay version of his lecture (the link provides both HTML and PDF versions), and reflect on how well the "two cultures" reflects--or doesn't--your scholarly background and aspirations. How are Snow's beliefs and concerns relevant to the contemporary practices of the digital humanities? How are they related to Taylor's slightly more recent perspective?

  1. C. P. Snow, The Two Cultures. The Rede Lecture, 1959.
  2. Mark C. Taylor, "End the University as We Know It," The New York Times, April 26, 2009.

Fill out this very short survey about your interests and experience.

Make sure you've received my email about the CUNY Academic Commons and that you've joined the course group. We'll discuss how to use the Commons for this course.

Class 2
Sept 3
Computing and Humanities:
Fragments of History

Talking about "computational problems in the digital humanities " has a (perhaps) surprisingly long history and complex context. Hockey provides a fairly detailed early history, focusing on text analysis; McCarty provides a more personal long-term perspective. Svensson and Kirschenbaum give more recent "macro level" histories. How is computing (in general) and computer science (as a 'science' or discipline) represented in these histories?

  1. Susan Hockey, "The History of Humanities Computing." In A Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004.
  2. Willard McCarty, "Getting there from here. Remembering the future of digital humanities." Literary and Linguistic Computing 29(3): 283–306, 2014.
  3. Matthew Kirschenbaum, "What is Digital Humanities and What's it Doing in English Departments? " In Debates in the Digital Humanities.
  4. Patrik Svensson, "Humanities Computing as Digital Humanities." Digital Humanities Quarterly 3(3), 2009.

We'll formulate research groups today.

Class 3
Sept 17
Images, Vision

Images and their analysis are also the subject of DH inquiry, though this is perhaps a newer flavor of inquiry than text-based inquiry is. The first two papers (and the supporting survey data) describe an interesting proto-collaboration between computer scientists and art historians. Saleh et al describe a computational approach to making claims of influence between artists, based on computational analysis of images of paintings and machine learning applied to features derived from that analysis. Responding to that paper and its popular reception, Spratt and Elgammal reflect on the significance and possibility of deeper collaboration between art historians and computer scientists

Chung et al and Toler-Franklin et al are fairly technical treatments of how computer vision techniques have been applied to two humanistic problems. Chung et al analyze and classify woodblock prints from a large corpus of ballads held at the Bodleian Library; Toler-Franklin et al apply computer vision and machine learning techniques to help re-assemble shattered classical frescoes.

  1. Saleh et al, "Toward automated discovery of artistic influence," Multimedia Tools and Applications, Augsut 2014.
  2. Emily L. Spratt and Ahmed Elgammal, "The Digital Humanities Unveiled: Perceptions Held by Art Historians and Computer Scientists about Computer Vision Technology ," self-published. Also, review the survey results on the project's home page.
  3. Chung et al, "Re-presentations of Art Collections," presented at VisArt 2014 Workshop, "When Computer Vision Meets Art," held in conjunction with ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6, 2014
  4. Toler-Franklin et al, "Multi-feature matching of fresco fragments," ACM Transactions on Graphics (TOG) 29(6), December 2010.

We'll check in about the groups' research topics, and we'll spend a little time collectively formulating the structure for the "mini-presentations" we'll do in the last few class meetings.

Class 4
Sept 24
A Model Topic: Topic Modeling

So let's look at an actual computational problem: topic modeling. Today's readings illustrate a relatively gentle computer science perspective (in an overview article published in Communications of the ACM) as well as a couple DH-flavored applications, in articles co/authored by David Mimno, a CS/IS professor at Cornell and currently the guardian of the MALLET natural language processing toolkit. Read them in any order you like, paying attention to how computer science and the humanities each manifest in these articles.

  1. David M. Blei, "Probabilistic Topic Models." Communications of the ACM 55(4), 2012
  2. Matthew L. Jockers and David Mimno, "Significant Themes in 19th-Century Literature," Poetics 41(6):750–769, 2013
  3. David Mimno, "Computational historiography: Data mining in a century of classics journals," Journal on Computing and Cultural Heritage (JOCCH) 5(1), 2012


We'll talk about everyone's mini-session topics, and develop a common structure for those sessions (which I'll post). I'll also propose a schedule for the mini-sessions, one that aims to cluster related sessions into one meeting. If that's possible.
Class 5
Oct 1
Text Encoding

As the histories we read a couple weeks ago suggest, text is a central focus of many digital humanists. One major problem is the digital encoding of texts (which can be exceptionally complicated things) in a way that best supports their study and dissemination. This week, we'll focus on the biggest player in humanities text encoding: the Text Encoding Initiative (TEI). Much technical and political effort has gone into producing a text encoding standard for the humanities that seems likely to stand the test of time; "TEI: History" gives a brief history of the development of the TEI, while Renear gives a longer (and somewhat out of date, ending c. 2004) history of the problem of text encoding.

Text encoding mainly from concerns about the dissemination and preservation of text, but it also can be used to support analyses that would be extraordinarily difficult to perform on "plain" text. "Text Analysis and the TEI" is an exercise in analyzing heavily-encoded text (in this case, Romeo and Juliet); do the exercise and spend some time playing with the possibilities. Duhaime writes of a larger exploration of the corpus of Shakespeare's plays. What kinds of questions/problems could you imagine exploring with this sort of encoded text?

  1. "TEI: History" from the TEI Consortium
  2. Alan Renear, "Text Encoding," In A Companion to Digital Humanities, ed. Susan Schreibman, Ray Siemens, John Unsworth. Oxford: Blackwell, 2004.
  3. Example: Emily Dickison Archive
  4. Example: Colonial Despatches
  5. "Text Analysis and the TEI," from the "Hacking the Humanities" course taught by Austin Mason. (Note this updated link for Folger Digital Texts Tag Guide.)
  6. Douglas Duhaime, "Classifying Shakespearean Drama with Sparse Feature Sets."

If youu knowledge of XML feels rusty, you may also want to look at the TEI's Gentle Introduction to XML.


We'll share group topics and check-in on project status.
Class 6
Oct 8
Music and Computing

Musicology is embracing digital methods, too; in many ways, the digital representation, manipulation, and analysis of music offers even greater challenges than similar work with text. Papdopoulos gives a (slightly out-of-date) survey of techniques for algorithmically generating musical works. Pugin offers a perspective on current problems in digital musicology. The Music Encoding Initiative provides a history and introduction to some of the challenges of encoding music in rough analogy with the Text Encoding Initiative's work. Pugin et al (Verovio) and meiView illustrate some applications of the MEI. Randolph introduces a completely different sort of computational music challenge.

  1. Laurent Pugin, "The challenge of data in digital musicology," Frontiers in Digital Humanities 2(4).
  2. AI Methods for Algorithmic Composition, Papadooulos et al, AISB Symposium on Musical Creativity, 1999.
  3. Example: Music Encoding Initiative, "A Gentle Introduction to MEI."
  4. Example: Pugin et al, "Verovio: A Library for Engraving MEI Music Notation into SVG," 15th International Society for Music Information Retrieval Conference, 2014. See also the Verovio website.
  5. meiView.
  6. David Randolph, "Didactyl: Toward a Useful Computational Model of Piano Fingering," presented at the Chicago Colloquium on Digital Humanities and Computer Science (DHCS) 2014. (If you're especially interested in this, you may want to read a longer report on this research.)
Class 7
Oct 15
Access

One of the digital humanities' general imperatives is improving our access both to archival materials (eg by digitizing texts and publishing them online) and to scholar materials (eg by publishing in "open access" fora, or by using and producing "open source" code). Spiro argues this impulse is, in part, a defining one for DH.

But are these forms of openness enough? Making sure people of all abilities have access to digital resources is a computational problem that goes well beyond the digital humanities but certainly affects them. How does "data visualization," for example, work for people whose vision is outside 'normal' parameters? Could data visualizations be translated into data visceralizations? Williams suggests that DH projects should hew to principles of "universal design," and reports on one project aiming to "make the digital humanities more open." Heron offers an example of how to think through accessibility in the context of a reasonably complex text-based application game.

  1. Lisa Spiro, " Why Digital Humanities?" (slides).
  2. George Williams, "Disability, Universal Design, and the Digital Humanities," In Debates in the Digital Humanities.
  3. George H. Williams, "Making the Digital Humanities More Open (Final Performance Report)."
  4. Michael James Heron, " A case study into the accessibility of text-parser based interaction," In Proceedings of the 7th ACM SIGCHI Symposium on Engineering Interactive Computing Systems. (Library access only.)
Class 8
Oct 22
Copyright, Property, Rights, Freedom

As James pointed out, our ability to study things can be limited by our right, or lack thereof, to study them. Which is to say, some limitations on access are not rooted in questions of ability but in political/economic/legal systems which can erect or eradicate barriers to access. (Such as the occasional reading for this class that requires access throuh the CUNY libraries.) It's amazingly difficult to find readings on this topic, either for academic study in general or for DH in particular. But for starters...

  1. Peter Suber's book Open Access discusses the significance of open access publishing for the academy. Chapter 1 talks about "What is Open Access?" and Chapter 6 talks specifically about "Copyright".
  2. O'Donnell, Casey. Production Protection to Copy(right) Protection: From the 10NES to DVDs. IEEE Annals of the History of Computing 31(3).
  3. Apathy and refunds are more dangerous than Piracy.

As you read these, be thinking about some broader issues (which these readings don't necessarily touch directly) about computing, access, and DH. How can (should?) the DH community agitate for more open access to data? (Where this engages copyright and DRM but also proprietary restrictions like corporate APIs). To what extent should we accept the "information landscape" with its barriers and blockages, and how much should we be working to change it? What can DH scholars do to engage these challenges?

Very last-minute addition (perhaps more for posterity): a paper on the implications of DMCA for the practice of science (and, by extention, I think, software-mediated research: Anticircumvention Rules: Threat to Science, Pamela Samuelson, Science v. 293, 2001, 2028–2031.

Class 9
Oct 29
Mini-Sessions: Chris/ICR; Martin/Image Processing Sebasitainao, Impedovo, "More than twenty years of advancements on Frontiers in handwriting recognition," Pattern Recognition 47(3), March 2014, 916–928.
Class 10
Nov 5
Mini-Sessions: Xiuyan/Music Generation; Cong/LDA Xiuyan assigns us:
  1. A website.
  2. AI Methods for Algorithmic Composition (which we looked at briefly already).
  3. Applying Learning Algorithms to Music Generation.
  4. Discovering Patterns from Large and Dynamic Sequential Data
Class 11
Nov 12
Project Work
Class 12
Nov 19
Mini-Sessions: James/Sentiment Analysis
  1. Twitter mood predicts the stock market: http://arxiv.org/pdf/1010.3003 Twitter Sentiment Classification using Distant Supervision: https://cs.stanford.edu/people/alecmgo/papers/TwitterDistantSupervision09.pdf
  2. How companies can use sentiment analysis to improve their business: http://mashable.com/2010/04/19/sentiment-analysis/#KT7WsdMV7iqU
  3. Sentiment Visualization app: https://www.csc.ncsu.edu/faculty/healey/tweet_viz/tweet_app/
  4. <>li>Learning Word Vectors for Sentiment Analysis: http://ai.stanford.edu/~ang/papers/acl11-WordVectorsSentimentAnalysis.pdf
Class 13
Dec 3
Class 14
Dec 10
Guest Speakers. Wrap-up and solution of the world's problems. Topics/Readings TBD
Finals Day
Dec 17
Project Presentations