1

I'm fairly new to python and NLTK. I'm generating bigrams measured on PMI as per the tutorials here. I want to get the frequency of the generated bigrams in the text. This question here suggests using

finder.ngram_fd.viewitems()

My attempt for the same using collocations:

import string
import codecs
import nltk
from nltk.collocations import *

bigram_measures = nltk.collocations.BigramAssocMeasures()

data = ''
filename = input("Enter file name\n")
with open (filename, "r", encoding="utf8") as myfile:
    for line in myfile:
            data += line

tokens = nltk.wordpunct_tokenize(data)
finder = BigramCollocationFinder.from_words(tokens)
finder.apply_freq_filter(5)
scored = finder.score_ngrams(bigram_measures.pmi)
a = finder.ngram_fd.viewitems()

The last line gives an error:

AttributeError: 'FreqDist' object has no attribute 'viewitems'

Any idea what should be corrected here or if there's an alternate way to get the frequency when using collocations?

gen_rex
  • 53
  • 1
  • 6

1 Answers1

0

Have found the alternative. Instead of using

a = finder.ngram_fd.viewitems()

I have used:

a = finder.ngram_fd.items()

Since I wanted to sort by frequency as well, I have used

sorted(finder.ngram_fd.items(), key=lambda x: x[1], reverse=True).

This will give me a list sorted on frequency. The 2nd element in the tuple is used for comparison and reverse= True is to sort the list in decreasing order.

gen_rex
  • 53
  • 1
  • 6