1

I'm running Python-3.x on a virtualenv, trying to process text with nltk.

I saw this post What are ngram counts... and the most upvoted answer has a bit of code using the count() method. but when I copy/paste it into mine:

import nltk
from nltk import bigrams
from nltk import trigrams

text="""Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam ornare
tempor lacus, quis pellentesque diam tempus vitae. Morbi justo mauris,
congue sit amet imperdiet ipsum dolor sit amet, consectetur adipiscing elit. Nullam ornare
tempor lacus, quis pellentesque diam"""

tokens = nltk.word_tokenize(text)
tokens = [token.lower() for token in tokens if len(token) > 1]
bi_tokens = bigrams(tokens)
tri_tokens = trigrams(tokens)

print [(item, tri_tokens.count(item)) for item in sorted(set(tri_tokens))]

I receive this message:

AttributeError: 'generator' object has no attribute 'count'

I see this other post on a monkeypatch for a count method but feel like that's somehow not related. Any idea what I might be doing wrong?

Community
  • 1
  • 1
r_e_cur
  • 457
  • 1
  • 4
  • 8

3 Answers3

1

It's because nltk.ngramsreturns an iterable generator, see https://www.python.org/dev/peps/pep-0255/ and What does the "yield" keyword do in Python?

You should use a collections.Counter:

>>> from nltk import ngrams
>>> from collections import Counter
>>> s = "This is a foo bar sentence".split()
>>> Counter(ngrams(s, 3))
Counter({('This', 'is', 'a'): 1, ('a', 'foo', 'bar'): 1, ('is', 'a', 'foo'): 1, ('foo', 'bar', 'sentence'): 1})
Community
  • 1
  • 1
alvas
  • 115,346
  • 109
  • 446
  • 738
0

You are facing this AttributeError: 'generator' object has no attribute 'count' issue because Generator is destroyed after first use in python.

tri_tokens is Generator. It is used twice in your code.

print [(item, tri_tokens.count(item)) for item in sorted(set(tri_tokens))]

In above line of code, tri_token used twice. So, when you want to get count of item, your generator is already destroyed, after (sorted(set(tri_tokens)) uses. That' why you get AttributeError issue.

So, Best way is to convert generator to list.

tri_tokens = list(tri_tokens)

Try below code:

import nltk
from nltk import bigrams
from nltk import trigrams

text="""Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam ornare
tempor lacus, quis pellentesque diam tempus vitae. Morbi justo mauris,
congue sit amet imperdiet ipsum dolor sit amet, consectetur adipiscing elit. Nullam ornare
tempor lacus, quis pellentesque diam"""

tokens = nltk.word_tokenize(text)
tokens = [token.lower() for token in tokens if len(token) > 1]
bi_tokens = bigrams(tokens)
tri_tokens = trigrams(tokens)

tri_tokens = list(tri_tokens)

print [(item, tri_tokens.count(item)) for item in sorted(set(tri_tokens))]
iNikkz
  • 3,729
  • 5
  • 29
  • 59
0

Other answers didn't work for me, I ended up using:

bi_tokens = list(bigrams(tokens))
tri_tokens = list(trigrams(tokens))

after converting to lists it is possible to count()

Kyek
  • 137
  • 1
  • 2
  • 10