Analyzing Authorship of Disputed Federalist Papers Using Unsupervised Machine Learning

Alice Liu, Ishita Abrol, Saathvik Selvan, Sushant Kannan

Abstract

One of the most important topics in the field of Natural Language Processing is authorship attribution, which deals with texts whose authorship is uncertain. It mainly focuses on writing style instead of topic matter. We use computational stylometry to convert given sets of text into numerical values that can then be processed and analyzed using various models. These numerical values usually represent lexical, syntactic, and/or semantic features. This paper discusses our approach to solve the controversy over authorship for several of The Federalists Papers by utilizing an unsupervised machine learning algorithm for authorship attribution. After prepossessing each paper, we extracted two features from the articles- TF-IDF and punctuation count frequencies. Then, we trained a KMeans clustering model on a matrix consisting of the features from articles with known authorship. The accuracy of this model was 81.82\%.To predict the authors of contested papers, we ran their collected features through our KMeans model and matched the resulting labels with the labels of our known papers. From doing so, We found that the majority of the documents were authored by Hamilton.

Github Link

Our Code!

Our Poster

Poster

Our Paper

Our Demo

View Our Model Predictions!