Sentiment Analysis of Product Reviews with Python Using NLTK

Sentiment Analysis of Product Reviews with Python Using NLTK

Here is a brief overview of how to use the Python package Natural Language Toolkit (NLTK) for sentiment analysis with Amazon food product reviews. This is a basic way to use text classification on a dataset of words to help determine whether a review is positive or negative. The following is a snippet of a more comprehensive tutorial I put together for a workshop for the Syracuse Women in Machine Learning and Data Science group.


The data for this tutorial comes from the Grocery and Gourmet Food Amazon reviews set from Jianmo Ni found at Amazon Review Data (2018). Out of the review categories to choose from, this set seemed like it would have a diverse range of people’s sentiment about food products. The data set itself is fairly large, so I use a smaller subset of 20,000 reviews in the example below.

A data frame preview shows the categories available from the reviews data set.
A preview of the full Groceries and Gourmet Food reviews data set from Amazon shows the available data features.

Steps to clean the main data using pandas are detailed in the Jupyter Notebook. The reviews are categorized on an overall rating scale of 1 to 5, with 1 being the lowest approval and 5 being the highest. I split the data so that reviews set as a 1 or 2 is labeled as negative and those set as 4 or 5 as positive. I omit ratings of 3 for this exercise because they could vary between negative and positive.

Prepare Data for Classification

Import the necessary packages. The steps below assume the data has already been cleaned using pandas.

import pandas as pd
import random
import string
import nltk
from nltk.tokenize import WhitespaceTokenizer
from nltk.corpus import stopwords
from nltk import classify
from nltk import NaiveBayesClassifier

Load in the cleaned data from a CSV from a data folder using pandas.

reviews = pd.read_csv('data/combined_reviews.csv')

The main cleaned dataframe has three columns: overview, reviewText, and reaction. The overview column has the numeric review rating, the reviewText column has the product reviews in strings, and the reaction column is marked with ‘positive’ or ‘negative’. Each row represents an individual review.

A condensed dataframe shows three columns: overall rating, review text, and reaction.
The cleaned pandas dataframe shows the three columns for overall rating, review text, and reaction type for the product reviews.

Reduce the main pandas dataframe to a smaller group using the sample function from the random package and a lambda function on the reaction column. I use an even split of 20,000 reviews.

sample_df = reviews.groupby('reaction').apply(lambda x: x.sample(n=10000)).reset_index(drop = True)

Use this sample dataframe to create a list for each sentiment type. Use the loc function from pandas to specify each entry that has ‘positive’ or ‘negative’ in the reaction column, respectively. Then, use the pandas tolist() function to convert the dataframe to a list type.

pos_df = sample_df.loc[sample_df['reaction'] == 'positive']
pos_list = pos_df['reviewText'].tolist()

neg_df = sample_df.loc[sample_df['reaction'] == 'negative']
neg_list = neg_df['reviewText'].tolist()

With these lists, use the lower() function and list comprehension to make each review lowercase. This reduces variance in the types of forms a word with various syntax can have.

pos_list_lowered = [word.lower() for word in pos_list] 
neg_list_lowered = [word.lower() for word in neg_list]

Turn the lists into string types to more easily separate words and prepare for more cleaning. For this text classification, we will consider the frequency of words in each type of review.

pos_list_to_string = ' '.join([str(elem) for elem in pos_list_lowered])  
neg_list_to_string = ' '.join([str(elem) for elem in neg_list_lowered])

To eliminate noise in the data, stop words (examples: ‘and’, ‘how’, ‘but’) should be removed, along with punctuation. Use NLTK’s built-in function for stop words to specify a variable for both stop words and punctuation.

stop = set(stopwords.words('english') + list(string.punctuation))

Create a variable for the tokenizer. Tokenizing will separate all the words in the list based on a specific variable. In this example, I chose to use a whitespace tokenizer. This means words will be separated based on whitespace.

tokenizer = WhitespaceTokenizer()

Use list comprehension on the positive and negative word lists to tokenize any word that is not a stop word or a punctuation item.

filtered_pos_list = [w for w in tokenizer.tokenize(pos_list_to_string) if w not in stop] 

filtered_neg_list = [w for w in tokenizer.tokenize(neg_list_to_string) if w not in stop]

Remove any punctuation that may be leftover if it was attached to a word itself.

filtered_pos_list2 = [w.strip(string.punctuation) for w in filtered_pos_list]
filtered_neg_list2 = [w.strip(string.punctuation) for w in filtered_neg_list]

As an optional sidebar, use NLTK’s Frequency Distribution function to check some of the most common words and their number of appearances in the respective reviews.

fd_pos = nltk.FreqDist(filtered_pos_list2) 
fd_neg = nltk.FreqDist(filtered_neg_list2)
A frequency distribution for positive food product reviews shows common words and their counts.
A list shows individual words pulled from positive food product reviews and their relative frequency in the sample set.

Create a function to make the feature sets for text classification. This will take the lists and create dictionaries with the proper labels.

def word_features(words):
     return dict([(word, True) for word in words.split()])

Label the sets of word features and combine into one set to be split for training and testing for sentiment analysis.

positive_features = [(word_features(f), 'pos') for f in filtered_pos_list2]
negative_features = [(word_features(f), 'neg') for f in filtered_neg_list2]

labeledwords = positive_features + negative_features

Randomly shuffle the list of words before use in the classifier to reduce the likelihood of bias toward a given feature label.


Training and Testing the Text Classifier for Sentiment

Create a training set and a test set from the list. From NLTK, call upon the Naïve Bayes Classifier model and specify the training set will train the model for sentiment analysis.

train_set, test_set = labeledwords[2000:], labeledwords[:500]
classifier = nltk.NaiveBayesClassifier.train(train_set)

Calculate the accuracy of the model.

print(nltk.classify.accuracy(classifier, test_set))

Provide some test example reviews for proof of concept and print the results.

print(classifier.classify(word_features('I hate this product, it tasted weird')))

Use NLTK to show the most informative features of the text classifier. This generates a list based on certain features and shows the likelihood that they point to a specific classification of positive or negative review.

NLTK's output for most informative features shows a list of words, their feature labels, and the likelihood of their occurrence in each review classification.
Output from NLTK’s most informative features for the Naïve Bayes Classifier.

Further Steps

This was an overview of sentiment analysis with NLTK. There are opportunities to increase the accuracy of the classification model. One example would be to use part-of-speech tagging to train the model using descriptive adjectives or nouns. Another idea to pursue would be to use the results of the frequency distribution and select the most common positive and negative words to train the model.

The full GitHub repository tutorial for this can be found here.