logo

NLP URL Categorizer

Links on webpage image

Overview

This is a Python tool I built to classify web links into predefined categories like programming, technology, sports, and shopping. It scrapes web pages, cleans and processes the content, and matches it against a categorized corpus using probabilistic models.

This project was a fun way for me to work with Natural Language Processing and build something kind of useful. I’m also planning to build a more practical version of this using Apple’s Core ML soon for auto-categorization in a time-tracking productivity app I’m working on.

Technologies

  • Scraping: Scrapy, BeautifulSoup
  • NLP: NLTK
  • Dataset: Categorized Wikipedia articles