Back to Projects

Document Classification System
By Dheeraj Kumar Bhaskar • March 29, 2025
The Document Classification System is a Python-based project that classifies text documents into multiple categories using machine learning algorithms implemented from scratch.
Features
- Text Preprocessing: Tokenization, stopword removal, and Bag of Words with Laplace smoothing.
- Multi-Category Classification: Supports multiple categories for documents.
- Algorithms Implemented: Naive Bayes, KNN, and Decision Tree from scratch.
- Performance Evaluation: Measures accuracy and effectiveness of each algorithm.
Technologies Used
- Python
- NumPy
- Matplotlib
- NLTK
Getting Started
Clone the repository and install dependencies:
git clone <repository-url>
cd document-classification
pip install -r requirements.txt