Questions tagged [nltk]

The Natural Language Toolkit is a Python library for computational linguistics.

The Natural Language ToolKit (NLTK) is a Python library for computational linguistics. It is currently available for Python versions 2.7 or 3.2+

NLTK includes a great number of common natural language processing tools including a tokenizer, chunker, a part of speech (POS) tagger, a stemmer, a lemmatizer, and various classifiers such as Naive Bayes and Decision Trees. In addition to these tools, NLTK has built in many common corpora including the Brown Corpus, Reuters, and WordNet. The NLTK corpora collection also includes a few non-English corpora in Portuguese, Polish and Spanish.

The book Natural Language Processing with Python - Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper is freely available online under the Creative Commons Attribution Noncommercial No Derivative Works 3.0 US Licence. A citable paper NLTK: the natural language ToolKit was first published in 2003 and then again in 2006 for researchers to acknowledge the contribution in ongoing research in Computational Linguistics.

NLTK is currently distributed under an Apache version 2.0 licence.

7139 questions

351

votes

7 answers

What is "entropy and information gain"?

I am reading this book (NLTK) and it is confusing. Entropy is defined as: Entropy is the sum of the probability of each label times the log probability of that same label How can I apply entropy and maximum entropy in terms of text mining? Can…

asked Dec 07 '09 at 11:54

TIMEX

259,804
351
777
1,080

202

votes

14 answers

What is the difference between lemmatization vs stemming?

When do I use each ? Also...is the NLTK lemmatization dependent upon Parts of Speech? Wouldn't it be more accurate if it was?

nlp nltk lemmatization

asked Nov 24 '09 at 00:48

TIMEX

259,804
351
777
1,080

188

votes

9 answers

What are all possible POS tags of NLTK?

How do I find a list with all possible POS tags used by the Natural Language Toolkit (NLTK)?

python nltk

asked Mar 13 '13 at 14:59

OrangeTux

11,142
7
48
73

187

votes

18 answers

Failed loading english.pickle with nltk.data.load

When trying to load the punkt tokenizer... import nltk.data tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle') ...a LookupError was raised: > LookupError: > ********************************************************************* …

python jenkins nltk

asked Feb 01 '11 at 19:43

Martin

1,873
2
13
5

187

votes

12 answers

How to check if a word is an English word with Python?

I want to check in a Python program if a word is in the English dictionary. I believe nltk wordnet interface might be the way to go but I have no clue how to use it for such a simple task. def is_english_word(word): pass # how to I implement…

python nltk wordnet

asked Sep 24 '10 at 16:01

Barthelemy

8,277
6
33
36

174

votes

17 answers

n-grams in python, four, five, six grams?

I'm looking for a way to split a text into n-grams. Normally I would do something like: import nltk from nltk import bigrams string = "I really like python, it's pretty awesome." string_bigrams = bigrams(string) print string_bigrams I am aware that…

python string nltk n-gram

asked Jul 08 '13 at 16:35

Shifu

2,115
3
17
15

162

votes

12 answers

How to get rid of punctuation using NLTK tokenizer?

I'm just starting to use NLTK and I don't quite understand how to get a list of words from text. If I use nltk.word_tokenize(), I get a list of words and punctuation. I need only the words instead. How can I get rid of punctuation? Also…

python nlp tokenize nltk

asked Mar 21 '13 at 12:22

lizarisk

7,562
10
46
70

139

votes

13 answers

How to remove stop words using nltk or python

I have a dataset from which I would like to remove stop words. I used NLTK to get a list of stop words: from nltk.corpus import stopwords stopwords.words('english') Exactly how do I compare the data to the list of stop words, and thus remove the…

python nltk stop-words

asked Mar 30 '11 at 12:36

Alex

1,853
5
16
15

134

votes

10 answers

how to check which version of nltk, scikit learn installed?

In shell script I am checking whether this packages are installed or not, if not installed then install it. So withing shell script: import nltk echo nltk.__version__ but it stops shell script at import line in linux terminal tried to see in this…

python linux shell scikit-learn nltk

asked Feb 13 '15 at 13:46

nlper

2,297
7
27
37

131

votes

28 answers

pip issue installing almost any library

I have a difficult time using pip to install almost anything. I'm new to coding, so I thought maybe this is something I've been doing wrong and have opted out to easy_install to get most of what I needed done, which has generally worked. However,…

python pip nltk easy-install

asked May 04 '13 at 04:29

contentclown

1,341
2
9
8

126

votes

5 answers

re.sub erroring with "Expected string or bytes-like object"

I have read multiple posts regarding this error, but I still can't figure it out. When I try to loop through my function: def fix_Plan(location): letters_only = re.sub("[^a-zA-Z]", # Search for all non-letters " ", …

python regex pandas nltk

asked May 01 '17 at 22:47

imanexcelnoob

1,283
2
9
8

123

votes

19 answers

Resource u'tokenizers/punkt/english.pickle' not found

My Code: import nltk.data tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle') ERROR Message: [ec2-user@ip-172-31-31-31 sentiment]$ python mapper_local_v1.0.py Traceback (most recent call last): File "mapper_local_v1.0.py", line 16,…

python unix nltk

asked Oct 26 '14 at 07:52

Supreeth Meka

1,879
2
15
16

112

votes

6 answers

Python: tf-idf-cosine: to find document similarity

I was following a tutorial which was available at Part 1 & Part 2. Unfortunately the author didn't have the time for the final section which involved using cosine similarity to actually find the distance between two documents. I followed the…

python machine-learning nltk information-retrieval tf-idf

asked Aug 25 '12 at 02:41

add-semi-colons

18,094
55
145
232

108

votes

7 answers

NLTK python error: "TypeError: 'dict_keys' object is not subscriptable"

I'm following instructions for a class homework assignment and I'm supposed to look up the top 200 most frequently used words in a text file. Here's the last part of the code: fdist1 = FreqDist(NSmyText) vocab=fdist1.keys() vocab[:200] But when I…

python python-3.x dictionary key nltk

asked Oct 16 '14 at 01:20

user3760644

1,137
2
9
6

100

votes

7 answers

How to config nltk data directory from code?

python path directory nlp nltk

asked Aug 19 '10 at 13:42

Juanjo Conti

28,823
42
111
133

2 3

…

99 100 Next