Sentiment Analysis

  • Tech Stack: Python (version 3.7+), Pandas, NumPy, Scikit-learn, SVM, KNN, DT, XGBoost, Random Forest, Logistic Regression
  • Github URL: Project Link

In my recent project, I analyzed 1.6 million tweets for sentiment analysis and feature engineering. The dataset included six columns: 'target' (polarity of the tweet, 0 = negative, 4 = positive), 'ids' (unique tweet ID), 'date' (timestamp), 'flag' (query or NO_QUERY), 'user' (tweeting user), and 'text' (tweet content). I meticulously cleaned the data using regex to remove emojis, URLs, and irrelevant characters. I then performed feature engineering, extracting metrics such as the maximum and minimum lengths of tweets. I trained and optimized various machine learning models to classify tweet sentiments accurately, saving the trained models for future use. This project enhanced my skills in data preprocessing, feature engineering, and machine learning, providing valuable experience in handling large datasets and implementing practical solutions.