Clio Wired: Week 6 Reflection

What I thought of the reading:

This week’s readings were enlightening because they demonstrate how digital tools are useful not only in presenting history to the public or other audiences, but also in the process of researching and creating historical scholarship.

Franco Moretti’s Graphs, Maps, and Trees was a nice introduction to what exactly can be done with manipulating and visually presenting historical data.   For Moretti, visualizations of trends, patterns, and cycles in literary history do not replace close reading of individual texts.  Rather, they add new layers of information, and sometimes even debunk generally held assumptions about literature’s history.  Tim Burke praises Moretti’s approach, in that viewing quantitative data about literature can problematize many commonplace assumptions about it.  However, Burke cautions that, while numbers can seem quite concrete and infallible, they can still be misleading.  For example, quantifying publication does not actually tell us about readership.  He also criticizes Moretti’s lack of emphasis on authors’ agency and the breaks and ruptures (as opposed to gradual divergence) in literary history.  However, I think Moretti is still useful in demonstrating how these tools can be used not just in the social and hard sciences, but also in the humanities.  Burke’s criticisms show that despite these visualizations’ seeming authoritativeness, the way in which they are interpreted or presented is still quite subjective.

While Moretti mostly deals with publication data for various genres, the rest of the authors focus on data mining specific texts or corpuses of texts in order to analyze them in new ways.  Daniel Cohen and Gregory Crane focus on the new scholarly opportunities presented by large digital collections such as Google Books or Project Gutenberg.  In conjunction with close examination of a limited number of texts, scholars who use various data mining/text mining tools can, in the words of Cohen, “find patterns, determine relationships, categorize documents, and extract information from massive corpuses.” For example, one might perform a statistical analyses of how often two keywords or phrases appear together, or find specific types of documents (such as syllabi) by assessing frequently used words in these texts.

Unfortunately, these large digital libraries can have some drawbacks, such as “noise” from incorrect OCR, missing texts due to copyright restrictions or cost of digitization, and inability to present or crawl texts in non-Roman alphabets.  For these reasons, scholars need to be careful about drawing conclusions from potentially-incomplete data sets.

Trying it out myself:

Playing around with some web-based text mining tools, it was obvious that some of the tools are better suited to entertainment than serious scholarship.  Wordle, which generates text clouds of the most frequently used words in a document, creates aesthetically pleasing visualizations.  However, aside from giving a general idea about the topics or keywords of a text, I am not sure that this tool has any serious scholarly use.  Here is my text cloud for Grimm’s Fairy Tales:

Wordle for Grimm's Fairy Tales

Wordle for Grimm’s Fairy Tales

Another tool which was entertaining but probably not statistically sound is Google’s Ngram Viewer.  Because you cannot control which texts are included in the analyzed corpus, the data may be misleading.  However, for general information rather than scholarly purposes, the Ngram Viewer can give a nice idea of when certain terms may have come in and out of fashion.  For example, in the Ngram below, you can see the shift from using the term Great War to the term World War:

Ngram: Great War vs. World War

Ngram: Great War vs. World War

Because of the user’s ability to choose texts and because of its myriad analytical tools, Voyant was the most promising tool for scholarly research.  I chose to analyze the same Grimm’s Fairy Tails text I tried in Wordle, available through Project Gutenberg.  I like how in the user can manipulate the data provided by Voyant in many ways.  Not only can you see the most frequently used words, but you can also compare the frequency of two words against each other and see words in context.  Voyant also provides a word cloud, which seems to be generated using a different algorithm than Wordle’s, as they came out differently.

Voyant analysis of Grimm's Fairy Tales

Voyant analysis of Grimm’s Fairy Tales

Although I felt like I couldn’t take full advantage of Voyant’s tools since I wasn’t undertaking an actual text-mining project, I did find it interesting that Voyant identified “said” as the most frequently used word in Grimm’s Fairy Tales.  This might say something useful about the structure of the tales or how the narrative action is pushed forward.  As you can see above, Wordle actually eliminated “said” from its word cloud, perhaps because it is too commonly used; this shows how lack of control over the algorithm or data manipulation of tools like Wordle and Ngram can lead to misleading information.

Advertisements

3 Comments

Filed under Reading and practicum reflection

3 responses to “Clio Wired: Week 6 Reflection

  1. Stray observation: the fact that the word cloud is from Grimm fairy tales, it’s really creepy that “little” and “one” are right there next to one and other. I also like that came and went are so prominent. It makes me wonder how often those words appear in other types of fiction. It probably says something about how fairy tale plots are constructed.

  2. Really interesting find re: “said” in Wordle vs. Voyant. I found it helpful that Wordle automatically deleted the “stop words,” when I was fiddling around with it, but you bring up an important point–sometimes these stop words are essential. Who knew!

  3. I think Burke’s criticism that these tools take away the authors agency is the most important point. Moretti seems like he is trying to find things that do not change throughout time. For example, he wanted to know the common theme that made one genre fade and a new one emerge at such regular intervals. Isn’t this kind of opposed to the principle of history, which focuses on how the unique historical context effects individual people and actions? Why study history if you can find common rules that apply for all of history?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s