5

Is it possible to calculate the relative frequency of elements occurring in a list in Python?

For example:

['apple', 'banana', 'apple', 'orange'] # apple for example would be 0.5
Alpine
  • 533
  • 1
  • 6
  • 18
  • `relative frequency of words` What is that? – thefourtheye Mar 21 '15 at 03:56
  • Possible duplicate of http://stackoverflow.com/questions/2600191/how-can-i-count-the-occurrences-of-a-list-item-in-python – Daniel Mar 21 '15 at 04:00
  • 2
    @Alpine, this really sounds like you are asking for us to do your homework. This program is not too difficult. You will want to check the length of the list and you will want to use dictionaries. – skyler Mar 21 '15 at 04:01

5 Answers5

9

You can use NLTK for this:

import ntlk
text = ['apple', 'banana', 'apple', 'orange']
fd = nltk.FreqDist(text)

Check out the tutorial in the book the how to and the source code

Alternately, you could use a Counter:

from collections import Counter
text = ['apple', 'banana', 'apple', 'orange']
c = Counter(text)
craighagerman
  • 373
  • 2
  • 8
  • 1
    Isn't NLTK overkill for this? – matsjoyce Mar 24 '15 at 18:12
  • Is NLTK overkill? Depends. If you have NLTK installed already it has the 'batteries included' to calculate Frequency distributions and print out stats (most_common etc) which I find very useful. I do a lot of NLP work and find NLTK very useful. It is hardly overkill for me - just a useful tool for a particular job. But if you aren't doing any NLP work and are just doing a one-off frequency distribution, then it is overkill. That is why I gave two options. – craighagerman Mar 25 '15 at 18:46
  • 2
    Thats not the relative frequency. Its just the counts. The relative frequency would have been {apple : 0.5, banana : 0.25, orange : 0.25} – Isbister Mar 06 '17 at 13:41
  • See below for an answer without third party requirements: https://stackoverflow.com/a/58412985. This is a question, that is not (at all) specific to NLP, so 1) the majority of people having a similar issue won't have this issue in the context of NLP and 2) even in that case it should not be assumed that people have nltk installed, due to the variety of NLP frameworks out there. That's regarding the first answer; the second part does not solve the question asked, since it returns absolute frequencies, whereas the question asked for relative frequencies. – pedjjj Apr 19 '20 at 11:31
4

The following snippet does exactly what the question asks for: given a Counter() object, return a dict that contains the same keys but with relative frequencies as values. No third party library required.

def counter_to_relative(counter):
    total_count = sum(counter.values())
    relative = {}
    for key in counter:
        relative[key] = counter[key] / total_count
    return relative
pedjjj
  • 958
  • 3
  • 18
  • 40
3

This simple code will do the job, returns a list of tuples but you can adapt it easily.

lst = ['apple', 'banana', 'apple', 'orange']
counts = [(word, lst.count(word) / len(lst)) for word in set(lst)] 

It will return the relative frequencies of each word as below:

[('orange', 0.25), ('banana', 0.25), ('apple', 0.5)]

Note that :

  1. iterate over set(lst) to avoid duplicates
  2. divide the lst.count by len(lst) to get relative frequencies
Ram
  • 31
  • 2
2

You can do this pretty easily by just counting the number of times the element occurs in the list.

def relative_frequency(lst, element):
    return lst.count(element) / float(len(lst))

words = ['apple', 'banana', 'apple', 'orange']
print(relative_frequency(words, 'apple'))
petabyte
  • 1,487
  • 4
  • 15
  • 31
0

Make a dictionary with words as keys, and times of occurence as values. After you have this dictionary you can divide each value by length of list of words.

justanothercoder
  • 1,830
  • 1
  • 16
  • 27