NLP URL Categorizer

Overview
This is a Python tool I built to classify web links into predefined categories like programming, technology, sports, and shopping. It scrapes web pages, cleans and processes the content, and matches it against a categorized corpus using probabilistic models.
This project was a fun way for me to work with Natural Language Processing and build something kind of useful. I’m also planning to build a more practical version of this using Apple’s Core ML soon for auto-categorization in a time-tracking productivity app I’m working on.
Technologies
- Scraping: Scrapy, BeautifulSoup
- NLP: NLTK
- Dataset: Categorized Wikipedia articles