Back to Projects
Document Classification System

Document Classification System

By Dheeraj Kumar BhaskarMarch 29, 2025

The Document Classification System is a Python-based project that classifies text documents into multiple categories using machine learning algorithms implemented from scratch.

Features

  • Text Preprocessing: Tokenization, stopword removal, and Bag of Words with Laplace smoothing.
  • Multi-Category Classification: Supports multiple categories for documents.
  • Algorithms Implemented: Naive Bayes, KNN, and Decision Tree from scratch.
  • Performance Evaluation: Measures accuracy and effectiveness of each algorithm.

Technologies Used

  • Python
  • NumPy
  • Matplotlib
  • NLTK

Getting Started

Clone the repository and install dependencies:

git clone <repository-url>
cd document-classification
pip install -r requirements.txt