Hate Speech Detection on Twitter

Thakur Alabhya Singh
5 min readApr 6, 2022



This article is a quick summary of my undergraduate project that I worked on under the guidance of Ms. P. Poornima (Asst. Professor, MGIT) & Dr. CRK Reddy (Head of Dept, CSE, MGIT)

Github link:

You can find the code for the entire project at: https://github.com/alabhyasingh/UndergradNLPproject


Twitter receives nearly 500 Million tweets every single day, making it impossible to create human-based systems to scan and detect hate.

Through this project, an ML-based model, trained to detect hate speech is proposed. The objective is to create a system that can automatically classify tweets as hate or non-hate.


“The research released by reporting forum Stop AAPI Hate on Tuesday revealed nearly 3,800 incidents of hate crimes reported over the course of roughly a year, with an estimate of unreported hate crimes going into thousands more…“ -NBC news.

This spread of hate can also be attributed to social media.

Hence, it is more important now, than ever, to have systems in place that detect and check hate and maintain the internet free of bias and safe for everyone to use.


The idea was to model classification techniques on an existing dataset containing hate and non-hate tweets to find one that detects hate speech with the greatest accuracy.

To achieve this, I first cleaned the dataset, converted it into a system understandable “numeric form” aka vectorized it, and then trained various modeling techniques on the dataset.

The final step was to test various classification techniques, to find one with the greatest accuracy in detecting hate speech in tweets.

To summarize:

Dataset Inspection:

I used a CSV file from Kaggle containing 31,962 tweets as the dataset to model the project.

The dataset can be found here: https://www.kaggle.com/datasets/vkrahul/twitter-hate-speech

The dataset contained ‘tweets’ and corresponding ‘labels’ for each of the tweets. The tweets are ‘strings’ of content posted by users and ‘labels’ refer to assigned values to each string: “hate” or “non-hate”.

Since the objective of our project is to classify tweets as hate or not, the project comes under a section of Machine Learning: classification modeling problem.

Classification Modeling Problem:

“In machine learning, classification refers to a predictive modeling problem where a class label is predicted for a given example of input data.

Examples of classification problems include:

  • Given an example, classify if it is spam or not.
  • Given a handwritten character, classify it as one of the known characters.
  • Given recent user behavior, classify as churn or not.

From a modeling perspective, classification requires a training dataset with many examples of inputs and outputs from which to learn.

A model will use the training dataset and will calculate how to best map examples of input data to specific class labels. As such, the training dataset must be sufficiently representative of the problem and have many examples of each class label.”

Test Data Visualisation:

Data visualization of the dataset was done to create word clouds containing the most repeated words of both the hate and non-hate labels. They gave an insight into the most used words.

The larger the size of the word in the word cloud, the greater its repetition in the data set.

Word cloud for non-hate tweets:

Word cloud for hate tweets:

1. Preprocessing:

In the preprocessing stage, I cleaned the dataset of any noisy content. Meaning that every tweet in the dataset is cleansed of parts of it that only add unnecessary computational effort and are completely redundant to creating an efficient model. The following were run on the dataset:

  • Removing “@”
  • Removing numbers
  • Removing greek characters
  • Removing slangs
  • Finding # word attached to it
  • Remove stop words

This is how the data looked after cleaning:

snapshot of results after preprocessing the tweets

Finally, I applied two additional procedures to further increase dataset efficiency:

  • Lemmatization: ‘It is the process of grouping together the different inflected forms of a word so they can be analyzed as a single item.’
  • Tokenization: ‘In order to get our computer to understand any text, we need to break that word down in a way that our machine can understand. That’s where the concept of tokenization in Natural Language Processing (NLP) comes in.’

2. Vectorization:

‘Processing natural language text and extracting useful information from the given word, a sentence using machine learning and deep learning techniques requires the string/text needs to be converted into a set of real numbers (a vector) — Word Vectorization.’

TFIDF: For the purpose of conversion of text to vector I used the “TFIDF” technique.

Due to memory constraints, I only used the top 500 features and not all the words in the corpus to generate the vectors. Doing so gave almost the same results with a significant reduction in computational load on RAM available to run the code.

code for TFIDF being run

This resulted in turning language into vectors that could be fed to modeling algorithms for classification.

3. Modeling:

In this step, I first divided the vectorized data into two sets to be utilized for training and testing various classifying techniques. I chose Naive Bayes, Random Forest, Logistic Regression & Decision Tree as the four classification algorithms.

After training the algorithms with a part of the vectorized data, the remaining dataset was used to test the four algorithms. Each algorithm produced its own accuracy in detecting hate speech.

  1. Naive Bayes:
results of Naive Bayes classification

The overall accuracy of Naive Bayes was pretty low at 52%

2. Random Forest:

results of Random Forest classification

The overall accuracy of Random Forest was 95%

3. Logistic Regression:

results of Logistic Regression classification

4. Decision Tree:

results of Decision Tree classification

The overall accuracy of Random Forest was 94%


I had chosen Naive Bayes as the benchmark for the classifiers’ performance and out of all, the best classification technique that could be used is Random Forest which detects hate and non-hate tweets accurately 95% of the time!



Thakur Alabhya Singh

UX designer & Entrepreneur on 5 days of the week, Storyteller on 7