Web mining and NLP module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
Pattern is a web mining and NLP nlp module for Python. It has tools for:
- Data Mining: web services (Google, Twitter, Wikipedia), web crawler, HTML DOM parser
- Natural Language Processing: part-of-speech taggers, n-gram search, sentiment analysis, WordNet
- Machine Learning: vector space model, clustering, classification (KNN, SVM, Perceptron)
- Network Analysis: graph centrality and visualization.
It is well documented, thoroughly tested with 350+ unit tests and comes bundled with 50+ examples. The source code is licensed under BSD and available from http://www.clips.ua.ac.be/pages/pattern.