0

I am new to python and nltk, and I want to find the frequency of bigrams in a text (string), and then sort the bigrams from highest to lowest frequency. I have found the bigrams and the frequencies using:

tokens = nltk.word_tokenize(text)
bgs = nltk.bigrams(tokens
fdist = nltk.FreqDist(bgs)

But I dont know how to sort it from highest to lowest frequency?

I know it is probably easy, but I cant figure it out. Hope someone will help me!

confused
  • 1
  • 1
  • Does this answer your question? [Sorting list based on values from another list](https://stackoverflow.com/questions/6618515/sorting-list-based-on-values-from-another-list) (I didn't flag as duplicate, because I don't have nltk installed and didn't check if `bgs` and `fdist` can be used directly this way.) – Stef Nov 20 '20 at 14:35

1 Answers1

0

You can try to keep bigrams - words and their values with two different lists and you can sort with use these lists. I shared a link, I hope it can be useful for your problem.

An example program that can generate bigram texts

bigrams = nltk.bigrams(tokens)    
bigrams_freq = nltk.FreqDist(bigrams)       
words_bigrams = []
        values_bigrams = []
    
    for items in bigrams_freq.items() :
        words_bigrams.append(items[0])
        values_bigrams.append(items[1])
    
    def sort_them(w,v):
        values = []
        words = []
        ##add all words values
        for i in v :
            values.append(i)
        
            ##sort them the biggest -> smallest
        values.sort(reverse=True)
        
        ##add to an array these values words
        for i in values :
            words.append(w[i])

    sort(words_bigrams,valus_bigrams)