Interested in knowing how
I-SPY works? Sit tight!
You're in right place. Scroll down to get through it.
We used a publicly available
Kaggle dataset
for developing an NLP hate speech detection model. The data set was
perfect for our use case since it was already clean and it had a clear
distinction concerning the choice of words between hate speech and
non-hate speech.
The data cleaning for most parts only included removing symbols and
characters like ‘/’,’*’,’%’,’@’ etc. We also used the stopwords
library available in Python to remove all stop words like “to”, ”and”,
”is” etc.
Since the ratio of tweets labeled as hate speech was very low, as
compared to its counterpart, we decided to undersample the data sample
to achieve better classification results. We also used the wordcloud
module in Python to get a better understanding of the type of dataset
we were dealing with.
Later on, we applied Tf-Idf vectorization to convert text data into
vectorizable forms which could be easily understood by our model.
We wanted to keep our ML model very simple and hence we decided to go
with Logistic Regression since it is easy to implement and will also
help us avoid overfitting since our model has to face real-world data
input.
We achieved an accuracy of about 85% on the test data.
We used the javascript library, ‘discord.js’ provided by Discord to program our bot. Functions like ‘client.on’ and ‘msg.author.bot’ were executed through this library. Apart from this, we used the node-fetch module to make API calls which enable our bot to pass data through our backend and retrieve the respective outcome provided by the ML model. Our bot checks the user’s text message for hate speech and flags it according to the boolean value fetched in the previous section and thereafter drops a warning message if it is found to be hate speech. Apart from that it also checks for NSFW links and URLs and drops warning messages accordingly.
We used Javascript Module Puppeteer for Automation of our Twitter Bot, Where we will provide other users with a Hashtag [ #Ispy_Saveme ] that will be used to trigger our bot. Our bot will keep a check on those Twitter Accounts. If someone else posts a Tweet using this hashtag, our bot will check whether it's hate speech and warn the offender. Currently we are maintaining an Array of all the users using our hashtag internally and in future we will maintain a database for the same. Since Twitter maintains stringent measures to alot API’s to users, we were not able to get one. So we used Puppeteer as an alternative for the same.
Show us your love and support ❤️ by upvoting!