Toxic Language Filter

This project’s goal is to build a classifier that can accurately classify and categorize toxic comments on social media platforms. The model will be trained on a pre-labeled dataset containing toxic and non-toxic online comments to identify different levels of toxicity on media platforms.

At the most fundamental level, the project aims to develop an NLP model that can accurately perform binary classification between toxic and non-toxic comments. In addition, the project aims to identify different categories of toxicity, such as identity-based hate, threats, insults, and others. Ultimately, the project sets the goal to develop a generalized, scalable language model as a solution to classifying different types of toxic comments across different platforms. This required training on different datasets, fine-tuning the model for high accuracy, and performance optimization–some of it was out of the scope of the course and so we adjusted the implementation according to time and resources available to us.

Project code can be found here