2

I had a question to printout the word with the maximum frequency in a .txt file and used the max function to obtain the key with maximum value as follows:

freq=dict()
f=open('words.txt','r')
for line in f:
     words=line.split()
     for word in words:
     word=word.lower()
     freq[word]=freq.get(word,0)+1

maximum=max(freq)
print(maximum)

But after cross-checking I found out that a wrong key was provided as output. the second part of the code was changed as follows:

maximum = max(freq, key=freq.get)
print(maximum)

Here, the output obtained matched with the word that occurred maximum times.

I would like to know the reason for the different results obtained in two cases and which way is better if dealing with similar situations/problems in future. Thank You.

Naveed Noor
  • 23
  • 1
  • 5
  • This is the link for txt file: https://github.com/naveed3923/Padh-AI-Foundations-of-Data-Science/blob/master/words.txt – Naveed Noor Jun 12 '20 at 14:10
  • @NaveedNoor for counting the occurances of a word you can simply use the count() method.Have a look at this https://www.geeksforgeeks.org/python-string-count/ – Prathamesh Jun 12 '20 at 14:10
  • @PrathameshJadhav That may be true. But I would like to know the reason how could the two results differ. – Naveed Noor Jun 12 '20 at 14:12
  • `max` of a dictionary returns (or seems to return...) the max of the keys. So `max(freq)` is equivalent of `max(freq.keys())` whereas `max(freq, key=freq.get)` is the equivalent of `max(freq.get(k) for k in freq.keys())` –  Jun 12 '20 at 14:27

4 Answers4

2

It happens because max does not know how to compare the elements of your dict. So if you don't provide a key, it compares them alphabetically by default. But when you give it a function, it will use that function and find max according to that function:

>>> counts = {"a": 10, "b": 5, "c": 20, "d": 15}
>>> max(counts)
'd'
>>> max(counts, key=counts.get)
'c'
Asocia
  • 5,935
  • 2
  • 21
  • 46
  • would this return the first key that is the max value? eg if "b": 20 was true instead, would it ALWAYS return "b"? – benwl May 22 '23 at 08:57
  • 1
    @benwl Yes, it would *always* return "b" in that situation (mostly\*). See [this](https://stackoverflow.com/a/6783101/9608759) answer for more info on that. \*: if your python version is 3.6 or below, it is not guaranteed that the order of keys in your dictionary is preserved. So you might get another result. – Asocia May 22 '23 at 11:50
1

max(freq) returns the maximum key in the dictionary, i.e. the last one alphabetically if the keys are strings

When you add the key=freq.get keyword argument, you get the element x with the maximum value of freq.get(x)

Simon Crane
  • 2,122
  • 2
  • 10
  • 21
  • am I correct then in saying that the better approach is the second one? – Naveed Noor Jun 12 '20 at 14:14
  • 1
    @Naveed Noor: It depend on what you are interested in. Maximum key or maximum value: `max(['a', 'b', 'c'])` or `max([10, 5, 20])` – Maurice Meyer Jun 12 '20 at 14:17
  • They do different things – Simon Crane Jun 12 '20 at 14:21
  • @SimonCrane Indeed, as I commented by the question -- what I can't find is a reference that tells us that `max` really is operating on the dictionary's keys, as opposed say to its items as tuples, as in `max(u for u in myDict.items())`. The observation that the key function is making a difference suggests that it is *not* receiving items but is actually receiving each dictionary key in turn. –  Jun 12 '20 at 14:29
  • Tuples are sorted by their first element first, and dictionary keys are unique, so sorting by tuples is the same as sorting by keys – Simon Crane Jun 12 '20 at 18:00
0

This is because max(freq) is comparing all words vs each other.

Whereas using key=freq.get you are comparing the values in the dict and returning the key with the highest value.

Jab
  • 26,853
  • 21
  • 75
  • 114
0

It can arrive that you have more than one key with a maximum value. The option

maximum = max(freq, key=freq.get)

will return the key associated with the first value. If you want to have another one you can use the following code:

def select_key_with_max_value(my_dict):
    my_lst=[(a,b) for (a,b)in my_dict.items()]    
    max_value = max(my_lst)[1]
    lst_max_values = [(key,values) for (key,values) in my_dict.items()\
                     if values ==max(my_lst)[1] ]
    lst_max_values =sorted(lst_max_values)
    num = len(lst_max_values)
    print ("There are {} items with a maximum value.".format(num))
    order = int(input("Which of the key(s) do you want by the keys' alphabetic 
                       order?"))-1
    return lst_max_values[order][0]

As an exemple:

Tv = { 'GameOfThrones':100, 'BreakingBad':100,'TMKUC' : 100} 

select_key_with_max_value(Tv)

results in:

There are 3 items with a maximum value.
Which of the key(s) do you want by the keys' alphabetic order?3
'TMKUC'
Hermes Morales
  • 593
  • 6
  • 17