2

In a class, I have to complete a code. It is taking a corpora of tokens and is supposed to provide a dictionnary of keys (bigrams from the corpus used form with nltk.bigrams()) and as values, the probability of that bigram appearing (based on the frequency of the bigram in my corpora). My solution was to do:

a = nltk.FreqDist(nltk.bigrams("aaaaaaacbegdeg"))

I have a dictionnary but it is trap in the following:

FreqDist({('a', 'a'): 6,
          ('a', 'c'): 1,
          ('b', 'e'): 1,
          ('c', 'b'): 1,
          ('d', 'e'): 1,
          ('e', 'g'): 2,
          ('g', 'd'): 1})

How do I take out the FreqDist? Best regard, Bianca

alvas
  • 115,346
  • 109
  • 446
  • 738
Bianca M.
  • 21
  • 1
  • 2

1 Answers1

2

The nltk.FreqDist object is a subtype of the native collections.Counter which is a native dict subclass, see Difference between Python's collections.Counter and nltk.probability.FreqDist

You can simply type cast it back to the native dict object like this:

>>> from nltk import FreqDist, bigrams
>>> a = FreqDist(bigrams("aaaaaaacbegdeg"))
>>> a
FreqDist({('a', 'a'): 6, ('e', 'g'): 2, ('d', 'e'): 1, ('c', 'b'): 1, ('b', 'e'): 1, ('a', 'c'): 1, ('g', 'd'): 1})
>>> dict(a)
{('d', 'e'): 1, ('a', 'a'): 6, ('c', 'b'): 1, ('e', 'g'): 2, ('b', 'e'): 1, ('a', 'c'): 1, ('g', 'd'): 1}
>>> b = dict(a)
>>> b
{('d', 'e'): 1, ('a', 'a'): 6, ('c', 'b'): 1, ('e', 'g'): 2, ('b', 'e'): 1, ('a', 'c'): 1, ('g', 'd'): 1}

BTW, there's also no need to convert it to a dict object since it will behave like a dict object for the primary get() function:

>>> a[('a', 'a')]
6
>>> b[('a', 'a')]
6

>>> a.get(('a', 'a'))
6
>>> b.get(('a', 'a'))
6
alvas
  • 115,346
  • 109
  • 446
  • 738