CS 405 Assignment 7

Storybook Sentiment - Due March 25

The purpose of this assignment is to give you more practice with collections and string processing.

General Functionality

The idea behind sentiment analysis is fairly straightforward. Words like "battered" are generally used in negative situations, and words like "acceptable" are generally used in positive situations. If we compare the ratio of positive words to negative words, we can get a very rough sense of the overall sentiment of a text document.

It is possible to get a more nuanced sentiment if we also allow the words to have magnitudes, which is what the wordnet 3.0 sentiment database allows us to do. I have provided a copy of it here. If you open the wordnet database provided, you will see that it has a bunch of data in it, formatted roughly as follows.

Your program should appear as follows.

Both load buttons should bring up a dialog box that lets you navigate and find a file.

Without a dictionary, your program won't run, so if the user tries to load a story before loading a dictionary, your program should produce an error.

Sentiment Calculation

You need to map every word in the WordNet dictionary to its sentiment. Once you have done that, you can process a story to get a sense of the overall sentiment. A story is processed by first identifying all words in the story. A word is defined as a sequence of non-whitespace characters. A whitespace character is a tab, a space, or a newline.

For each word in the story, you are to look up the word in your sentiment dictionary. If the word is not in the sentiment dictionary, you ignore the word. If the word is in the sentiment dictionary and the sentiment is 0, you should also ignore the word. If the word is in the sentiment dictionary and the sentiment is not 0, you are to include the number in the average sentiment.

A simple example is here. The two words in small.txt are able, which has a sentiment of 0.125, and unable, which has a sentiment of -0.75. The average of 0.125 and -.75 is -0.3125, so that is what your program should output.

Loading the Dictionary

The file format of the WordNet dictionary is specified above. Some words have both positive and negative sentiment. If this occurs, you are to use the larger value. For example, the word "living" has both positive sentiment (0.5) and negative sentiment (0.125). For this word, you should use the 0.5 positive sentiment, and ignore the 0.125 negative sentiment.

You will find that some words share a line. For example, "dorsal" and "abaxial" share line 31. If this occurs, you are to only use the first word, and ignore all subsequent words on the same line.

Evaluation

Note that Dracula has about 1/4 of the positive sentiment as Pride and Prejudice.

Submission Instructions

To submit this program, I would like all students to zip their project, and send it to me. I would like you to submit the entire project, which should be named Prog_7_lastname, where lastname is your last name. For me, this is Prog_7_Wilt. Students should email their completed projects to cs405@cs.unh.edu.

Late Policy

The assignment is due prior to midnight on the listed due date. For this assignment, that means you must turn in your solution before to March 26.

Unlike other classes you may have taken in the past, no late work is allowed for this class. This is worth repeating, because it is extremely important: no late work is allowed for this class. We will move quickly, and I do not want students straggling behind trying to catch up on work from previous weeks, which is what generally happens when late work is accepted.