Reference Site: https://textacy.readthedocs.io/en/stable/
Features
- Stream text, json, csv, and spaCy binary data to and from disk
- Clean and normalize raw text, before analyzing it
- Explore a variety of included datasets, with both text data and metadata
- from Congressional speeches to historical literature to Reddit comments
- Access and filter basic linguistic elements, such as words and ngrams, noun chunks and sentences
- Extract named entities, acronyms and their definitions, direct quotations, key terms, and more from documents
- Compare strings, sets, and documents by a variety of similarity metrics
- Transform documents and corpora into vectorized and semantic network representations
- Train, interpret, visualize, and save sklearn-style topic models using LSA, LDA, or NMF methods