Printing the largest and the second largest element in a dictionary

Question

I've written a function that takes a string as an input and hands over a dictionary that indicates the most occurring elements/characters.

Part 1 - Just finding the most occurring character

def mostoccur(arr):

   n = len(arr)
   dict={}

   # filling the dictionary with itesm and their frequency using dict.get()

   for i in range(n):
       dict[arr[i]] = dict.get(arr[i],0) + 1


   return dict


string = "aabbccdddddddd"

# turning the string into a list, so it's iterable

ls = list(string)
ans = mostoccur(ls)
print("The dictionary of all the characters and their frequency: \n",ans)

maximum = max(ans)
print("\nThe most occuring character is:",maximum)

But then I got more curious, and I wanted to print the most occurring and second most occurring element in the dictionary. Thus, I've written something like this:

Part 2 - Alright, now let's find the second most occurring character

# defining a dictionary to store the most and second most occurring element

biggest = {'most-occuring':0, 'second-most-occuring':0}

# nested loops to go through all values in the dictionary

for a in ans.items():                                  # O(N)
   for b in ans.items():                               # O(N)
       if a < b:                                       
           a,b = b,a
           biggest['most-occuring'] = a
           biggest['second-most-occuring']= b

                                                       # Total = O(N^2)

print(biggest)

Big O

I've written the Big O of each operation next to it, and when I look at it, I really don't like what I've written. I mean, O(N^2) sounds too expensive and inefficient.

Would you be willing to illuminate me on better ways of writing this?

Please bear in mind that I'm not looking for a method that utilizes any libraries.

This is `O(n**2)` because `a < b` is `O(1)`. An `O(n)` solution would be to iterate through the list once and keep track of the two highest values you've seen. Also, see [`collections.Counter.most_common`](https://docs.python.org/3/library/collections.html#collections.Counter.most_common) — Patrick Haugh, Aug 10 '18 at 14:31
My understanding is that, the 'if' is gonna happen for each cycle of the 'loop' and therefore, it is (O (f(n) x g(n)), am I wrong? — Pouya Ataei, Aug 10 '18 at 14:34
[Counting repeated characters in a string in python](https://stackoverflow.com/questions/991350/counting-repeated-characters-in-a-string-in-python) may give you another perspective on how to simplify your approach. — benvc, Aug 10 '18 at 14:37
@PatrickHaugh the link you've sent, requires a library. I have to import 'Counter' from 'collections', which I was aware of before posting the question... — Pouya Ataei, Aug 10 '18 at 14:48
@NeophytePolyhistor I was recommending that you actually go and read the source code to see how `Counter.most_common` is implemented, sorry if I was unclear. — Patrick Haugh, Aug 10 '18 at 14:50

TallChuck · Accepted Answer · 2018-08-10T15:16:55.073

2

Here is a simple algorithm that does it in O(n):

string = "aabbbccdddddddd"

def get_frequency(string):
    chars = {}
    for char in string:
        chars[char] = chars.get(char, 0) + 1 

    biggest = [None, 0]
    second = [None, 0]

    for entry in chars.items():
        char, count = entry
        if count > biggest[1]:
            second = biggest
            biggest = entry
        elif count > second[1]:
            second = entry

    return {
        "most-occurring": biggest[0],
        "second-most-occurring": second[0]
    }

print(get_frequency(string))

This prints {'second-most-occurring': 'b', 'most-occurring': 'd'}

Note that I added an extra 'b' to string in order to make that the second most frequent letter

edited Aug 10 '18 at 15:16

answered Aug 10 '18 at 15:09

TallChuck

1,725
11
28

You could do it more pythonic combining two lines into -> second,biggest = biggest, entry – Pouya Ataei Aug 11 '18 at 05:07
Overall, thank you very much for the solution. I like it :) – Pouya Ataei Aug 11 '18 at 05:42
Hello mate, I've witnessed that removing 'None' from the array would result in an error! I don't understand why! would you illuminate me ? – Pouya Ataei Aug 12 '18 at 06:40
The for loop is trying to access the second element (index 1) of each array, if you remove None, you'll get an IndexError – TallChuck Aug 13 '18 at 13:10
so why don't we still get the index error after having loop with one element and accessing index 0 ? – Pouya Ataei Aug 13 '18 at 13:56

T Burgis · Answer 2 · 2018-08-10T15:03:03.357

0

Use heapq.nlargest, like this:

from collections import Counter
import heapq
string = "aabbccdddddddd"
counts = Counter(string)
heapq.nlargest(2, counts, key=lambda k: counts[k])

And using no libraries, assuming your function returns the same thing as Counter:

keys = list(counts.keys())
keys.sort(key=lambda x: counts[x],reverse=True)
top_two = keys[:2] # just the keys
{ k : counts[k] for k in keys[:2] } # dict of top two

edited Aug 10 '18 at 15:03

answered Aug 10 '18 at 14:49

T Burgis

1,395
7
9

No libraries, no built-in functions. Read the question carefully. I'm aware of the Counter. – Pouya Ataei Aug 10 '18 at 14:51
Sorry. Missed that line. Adding an edit to my answer. – T Burgis Aug 10 '18 at 15:02

Printing the largest and the second largest element in a dictionary

Part 1 - Just finding the most occurring character

Part 2 - Alright, now let's find the second most occurring character

Big O

2 Answers2