0

I'm using the nltk library in Python; my background is Java. I don't understand the console output for the code I wrote. Why does Python return a strange form despite my initializing variable tokens as a list.

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import nltk
def tokenize_sentence(sentence):
    tokens=[]
    tokens = word_tokenize(sentence)

    tokens = (word for word in tokens if word not in \
              set(stopwords.words('english')))
    return tokens;

a="John is an actor."
print(tokenize_sentence(a))

Output:

<generator object tokenize_sentence.<locals>.<genexpr> at 0x10dc5b1a8>

I see this output as something similar to what Java does when I try to print an object for which toString() method is not defined.

Prune
  • 76,765
  • 14
  • 60
  • 81
AV94
  • 1,824
  • 3
  • 23
  • 36

1 Answers1

6

Initial assignment is not a type declaration. Python free variables do not have type declarations. For instance, you're allowed to write

x = 7
x = []
x = "Hello"

... and see x change types with every assignment.

In this case, you have three independent assignments to token. Each of these works the same way:

  1. Evaluate the expression on the right side.
  2. Make the variable on the left refer to that value.

The prior value of the variable is ignored. When you built an in-line generator -- your (word for word ...) expression -- and assigned it to tokens, you sent the previous value to the bit bucket (i.e. garbage collection). When you printed the generator, rather than iterating through its functionality, you got the Python representation of the object.

As Jim Fasarakis Hilliard already mentioned, if you want a list, then use list comprehension syntax: brackets, not parentheses. Also, did you intend to do anythign with the prior value(s) of tokens? At the moment, I don't think those assignments have any lasting effect.

Prune
  • 76,765
  • 14
  • 60
  • 81