Why does `import word_tokenize` from NLTK works in the interpreter but not in my script?

Question

I am trying to tokenize a sentence using nltk. when i do it through python shell i get the correct answer.

>>> import nltk
>>> sentence = "Mohanlal made his acting debut in Thiranottam (1978), but the film got released only after 25 years due to censorship issues."
>>> tokens = nltk.word_tokenize(sentence)
>>> tokens
['Mohanlal', 'made', 'his', 'acting', 'debut', 'in', 'Thiranottam', '(', '1978', ')', ',', 'but', 'the', 'film', 'got', 'released', 'only', 'after', '25', 'years', 'due', 'to', 'censorship', 'issues', '.']

But when i write the same code in a file and try to run it i got the following error.

    Traceback (most recent call last):
  File "tokenize.py", line 1, in <module>
    import nltk
  File "/usr/local/lib/python2.7/dist-packages/nltk/__init__.py", line 114, in <module>
    from nltk.collocations import *
  File "/usr/local/lib/python2.7/dist-packages/nltk/collocations.py", line 38, in <module>
    from nltk.util import ngrams
  File "/usr/local/lib/python2.7/dist-packages/nltk/util.py", line 13, in <module>
    import pydoc
  File "/usr/lib/python2.7/pydoc.py", line 55, in <module>
    import sys, imp, os, re, types, inspect, __builtin__, pkgutil, warnings
  File "/usr/lib/python2.7/inspect.py", line 39, in <module>
    import tokenize
  File "/home/gadheyan/Project/Codes/tokenize.py", line 2, in <module>
    from nltk import word_tokenize
ImportError: cannot import name word_tokenize

Here's the code that i run.

import nltk
from nltk import word_tokenize

sentence = "Mohanlal made his acting debut in Thiranottam (1978), but the film got released only after 25 years due to censorship issues."
tokens = nltk.word_tokenize(sentence)
print tokens

Are you using an IDE that might be running a different version of Python? — Tyler, Oct 02 '15 at 06:08
Check out this too: http://stackoverflow.com/questions/23155704/python-failed-to-import-nltk-but-works-with — Hayley Guillou, Oct 02 '15 at 07:01

score 5 · Answer 1 · edited May 23 '17 at 10:26

TL;DR

It's a naming problem, see Python failed to `import nltk` in my script but works in the interpreter

Rename your file to my_tokenize.py instead of tokenize.py, i.e.

$ mv /home/gadheyan/Project/Codes/tokenize.py /home/gadheyan/Project/Codes/my_tokenize.py
$ python my_tokenize.py

In long:

From your traceback, you see:

File "/usr/lib/python2.7/inspect.py", line 39, in <module>
    import tokenize
  File "/home/gadheyan/Project/Codes/tokenize.py", line 2, in <module>
    from nltk import word_tokenize

In NLTK, there is a package call nltk.tokenize where nltk.word_tokenize resides, http://www.nltk.org/_modules/nltk/tokenize.html

So when you have your script name as tokenize.py and when you call nltk.word_tokenize and when it goes into the nltk and tries to import nltk.tokenize, it imports your script (/home/gadheyan/Project/Codes/tokenize.py) instead of nltk.tokenize since inspect.py uses the local namespaces

BTW

Redundant namespaces would still work in python but it's better to keep your namespaces and global variables clean, i.e. use this:

alvas@ubi:~$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from nltk import word_tokenize
>>> sent = 'this is a foo bar sentence'
>>> word_tokenize(sent)
['this', 'is', 'a', 'foo', 'bar', 'sentence']
>>> exit()

Or this:

alvas@ubi:~$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> sent = 'this is a foo bar sentence'
>>> nltk.word_tokenize(sent)
['this', 'is', 'a', 'foo', 'bar', 'sentence']
>>> exit()

But try to avoid this (although it still works regardless):

alvas@ubi:~$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> from nltk import word_tokenize
>>> sent = 'this is a foo bar sentence'
>>> word_tokenize(sent)
['this', 'is', 'a', 'foo', 'bar', 'sentence']
>>> nltk.word_tokenize(sent)
['this', 'is', 'a', 'foo', 'bar', 'sentence']
>>> exit()

Thank you for clarifying this! I was trying to understand why people import nltk and then continue importing subpackages ugh, I knew it didn't seem right — Anscandance, Jun 27 '22 at 23:14

Why does `import word_tokenize` from NLTK works in the interpreter but not in my script?

1 Answers1