What about your internship at RRCHNM has been an eye-opening (new or unexpected) experience? What were your initial expectations? Have these expectations changed now that you are half-way through? How? Why?
In the past few weeks, I have been concentrated on exploring topic modeling possibilities offered by Voyant Tools and MALLET to quantitatively study data from the newsletter Appalachian Trailways News (ATN). The aim of these activities has been to gain an understanding on how platforms and software provide topic modeling tools for analyzing larger volumes of unformatted text. Some of the overarching questions during these exercises include:
- What are the different topic modeling possibilities offered by each software/platform?
- What are some of the positives of using Voyant Tools for topic modeling of ATN corpus? What are the negatives?
- What are the positives of using MALLET for topic modeling of ATN corpus? What are the negatives?
- Does the model perform better with less topics or more stopwords?
- What are other ways I can tweak the model?
- Am I making the right assumptions? How do I go about modifying them?
These activities have so far yielded two reports (I am working on report #3) with some interesting findings. The Voyant Tools Topic feature was a much more user-friendly way of being introduced to topic modeling. I could see how topics (clusters) were distributed across the ATN data batch. While Voyant Tools allows me to study topic models in a quick fashion, there were limitations when it comes modifying the maximum number of terms per document to use for the modeling.
I came into this task with a decent knowledge of the dataset under examination and, with what I thought, a fair idea of the purpose the model would serve, but I am not as successful inferring topics from key words as I wanted to. On the one hand, I am getting more comfortable using the command line and running some routines, while on the other, I feel there is so much more I need to learn about topic distribution and frequency, as well as figuring out more ways to train my model. Those will be my next steps during my internship. These reports I have generated, in combination with continuous meetings with Dr. Kelly and reading of prior literature have helped me gain a much greater appreciation for what topic modeling does and its best applications.
Outside the AT project, I would like to highlight one of the latest activities where I participated with the Center. It was a brainstorming session for possible grant proposal Mellon’s Monuments Project. It was exciting to be part of such a conversation, as I got to hear fresh ideas from others while contributing with some of my own. Looking forward to continuing with my participation.