Back to Projects

05 / Project

Sentiment Analysis on 1.6M Tweets

Large-scale NLP on the Kaggle Twitter dataset

sentiment-analysis.stack

The Problem

Understanding public sentiment at scale requires a pipeline that can clean, vectorize, and classify millions of noisy, informal tweets.

What I Built

A sentiment classifier (positive / negative / neutral) trained on 1.6M tweets using a classical NLP pipeline.

Approach

Tweets are tokenized and normalized with NLTK, vectorized with gensim-based embeddings, and classified using scikit-learn models, with SciPy/NumPy/Pandas handling the numerical heavy lifting.