EN / TR
Özge Karasu Özge Karasu

Curiosity leads me. I follow and write.

Sentiment Analysis from Tweets using SVM

08.11.2024

Project Overview

This project focused on building a sentiment analysis pipeline using Support Vector Machines (SVMs) to classify tweets as either positive or negative. The task involved preprocessing noisy text data, extracting informative features, and evaluating model performance through cross-validation and error analysis.

Techniques Used

  • Preprocessing:

    • Tokenisation with NLTK
    • Lowercasing, punctuation removal, stop word filtering
    • Lemmatisation and emoji handling
  • Feature Extraction:

    • Bag-of-Words and TF-IDF vectorisation
    • Word frequency and binary presence tests
  • Model:

    • Support Vector Machines (LinearSVC from scikit-learn)
    • Grid search over C-values
    • 10-fold cross-validation
  • Evaluation:

    • Accuracy, Precision, Recall, F1-score
    • Confusion matrix
    • Manual error inspection (False Positives/Negatives)

Results

  • Best performance achieved using TF-IDF features and optimised SVM parameters.
  • F1-score improved from baseline ~0.76 to ~0.83 after advanced preprocessing and feature tuning.
  • Model generalised well with minimal overfitting.