Back to Work
Toxicity Mining
AI / ML

Toxicity Mining

Categorizes toxic language and hate speech on social media using BERT and traditional ML classifiers.

2024
TypeScriptBERTDistilBERTLightGBMSVM

About This Project

A machine learning project that categorizes toxic language and hate speech on social media platforms, distinguishing between generic offensive language and hate speech targeted at specific entities, users, or groups. Uses BERT and DistilBERT transformer models alongside traditional classifiers like Naive Bayes, Logistic Regression, SVM, and LightGBM. Trained on Google Jigsaw Civil Comments (1.8M rows) and TweetEval Hate Speech datasets.

NLPBERTHate Speech DetectionMachine Learning