Here is my code, just performing some tokenization with nltk.
import nltk
from nltk.corpus import stopwords
tokens = nltk.word_tokenize(doc, language='english')
# remove all the stopwords
filtered = [w for w in tokens if (w not in stopwords.words('english')) and (w.isalnum())]
I've already downloaded the punkt package. I also tried to copy and paste the correct folder into the places that the error message said it searched. Here is the error, that I saw in other similar questions.
Resource u'tokenizers/punkt/english.pickle' not found.
Please use the NLTK Downloader to obtain the resource: >>>
nltk.download() Searched in:
- '/root/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''
I even tried to reinstall the whole nltk and packages, but it didn't work. Useful information about the environment: -run through terminal of Pycharm IDE -operting system: Ubuntu 15 -nltk installed using pip -nltk_data installed in the default location /home/user/nltk_data
Please, don't tell me to use nltk.download('punkt') because I have it. Thanks for your help.