ML Model

We used a publicly available Kaggle dataset for developing an NLP hate speech detection model. The data set was perfect for our use case since it was already clean and it had a clear distinction concerning the choice of words between hate speech and non-hate speech.
The data cleaning for most parts only included removing symbols and characters like ‘/’,’*’,’%’,’@’ etc. We also used the stopwords library available in Python to remove all stop words like “to”, ”and”, ”is” etc.
Since the ratio of tweets labeled as hate speech was very low, as compared to its counterpart, we decided to undersample the data sample to achieve better classification results. We also used the wordcloud module in Python to get a better understanding of the type of dataset we were dealing with.
Later on, we applied Tf-Idf vectorization to convert text data into vectorizable forms which could be easily understood by our model.
We wanted to keep our ML model very simple and hence we decided to go with Logistic Regression since it is easy to implement and will also help us avoid overfitting since our model has to face real-world data input.
We achieved an accuracy of about 85% on the test data.

Discord Bot

We used the javascript library, ‘discord.js’ provided by Discord to program our bot. Functions like ‘client.on’ and ‘msg.author.bot’ were executed through this library. Apart from this, we used the node-fetch module to make API calls which enable our bot to pass data through our backend and retrieve the respective outcome provided by the ML model. Our bot checks the user’s text message for hate speech and flags it according to the boolean value fetched in the previous section and thereafter drops a warning message if it is found to be hate speech. Apart from that it also checks for NSFW links and URLs and drops warning messages accordingly.

Twitter Bot

We used Javascript Module Puppeteer for Automation of our Twitter Bot, Where we will provide other users with a Hashtag [ #Ispy_Saveme ] that will be used to trigger our bot. Our bot will keep a check on those Twitter Accounts. If someone else posts a Tweet using this hashtag, our bot will check whether it's hate speech and warn the offender. Currently we are maintaining an Array of all the users using our hashtag internally and in future we will maintain a database for the same. Since Twitter maintains stringent measures to alot API’s to users, we were not able to get one. So we used Puppeteer as an alternative for the same.

DOCUMENTATION

ML Model

Discord Bot

Twitter Bot

Demo Video

Liked Our Bot?