1

I've written a function that takes a string as an input and hands over a dictionary that indicates the most occurring elements/characters.

Part 1 - Just finding the most occurring character

def mostoccur(arr):

   n = len(arr)
   dict={}

   # filling the dictionary with itesm and their frequency using dict.get()

   for i in range(n):
       dict[arr[i]] = dict.get(arr[i],0) + 1


   return dict


string = "aabbccdddddddd"

# turning the string into a list, so it's iterable

ls = list(string)
ans = mostoccur(ls)
print("The dictionary of all the characters and their frequency: \n",ans)

maximum = max(ans)
print("\nThe most occuring character is:",maximum)

But then I got more curious, and I wanted to print the most occurring and second most occurring element in the dictionary. Thus, I've written something like this:

Part 2 - Alright, now let's find the second most occurring character

# defining a dictionary to store the most and second most occurring element

biggest = {'most-occuring':0, 'second-most-occuring':0}

# nested loops to go through all values in the dictionary

for a in ans.items():                                  # O(N)
   for b in ans.items():                               # O(N)
       if a < b:                                       
           a,b = b,a
           biggest['most-occuring'] = a
           biggest['second-most-occuring']= b

                                                       # Total = O(N^2)

print(biggest)

Big O

I've written the Big O of each operation next to it, and when I look at it, I really don't like what I've written. I mean, O(N^2) sounds too expensive and inefficient.

Would you be willing to illuminate me on better ways of writing this?

Please bear in mind that I'm not looking for a method that utilizes any libraries.

Pouya Ataei
  • 1,959
  • 2
  • 19
  • 30
  • Why do you think `if a < b` is O(n)? I see that as O(1). – jpp Aug 10 '18 at 14:30
  • 2
    This is `O(n**2)` because `a < b` is `O(1)`. An `O(n)` solution would be to iterate through the list once and keep track of the two highest values you've seen. Also, see [`collections.Counter.most_common`](https://docs.python.org/3/library/collections.html#collections.Counter.most_common) – Patrick Haugh Aug 10 '18 at 14:31
  • My understanding is that, the 'if' is gonna happen for each cycle of the 'loop' and therefore, it is (O (f(n) x g(n)), am I wrong? – Pouya Ataei Aug 10 '18 at 14:34
  • 3
    [Counting repeated characters in a string in python](https://stackoverflow.com/questions/991350/counting-repeated-characters-in-a-string-in-python) may give you another perspective on how to simplify your approach. – benvc Aug 10 '18 at 14:37
  • 1
    @PatrickHaugh the link you've sent, requires a library. I have to import 'Counter' from 'collections', which I was aware of before posting the question... – Pouya Ataei Aug 10 '18 at 14:48
  • @NeophytePolyhistor I was recommending that you actually go and read the source code to see how `Counter.most_common` is implemented, sorry if I was unclear. – Patrick Haugh Aug 10 '18 at 14:50
  • @PatrickHaugh oh great, that's a good idea! – Pouya Ataei Aug 10 '18 at 14:50

2 Answers2

2

Here is a simple algorithm that does it in O(n):

string = "aabbbccdddddddd"

def get_frequency(string):
    chars = {}
    for char in string:
        chars[char] = chars.get(char, 0) + 1 

    biggest = [None, 0]
    second = [None, 0]

    for entry in chars.items():
        char, count = entry
        if count > biggest[1]:
            second = biggest
            biggest = entry
        elif count > second[1]:
            second = entry

    return {
        "most-occurring": biggest[0],
        "second-most-occurring": second[0]
    }

print(get_frequency(string))

This prints {'second-most-occurring': 'b', 'most-occurring': 'd'}

Note that I added an extra 'b' to string in order to make that the second most frequent letter

TallChuck
  • 1,725
  • 11
  • 28
0

Use heapq.nlargest, like this:

from collections import Counter
import heapq
string = "aabbccdddddddd"
counts = Counter(string)
heapq.nlargest(2, counts, key=lambda k: counts[k])

And using no libraries, assuming your function returns the same thing as Counter:

keys = list(counts.keys())
keys.sort(key=lambda x: counts[x],reverse=True)
top_two = keys[:2] # just the keys
{ k : counts[k] for k in keys[:2] } # dict of top two
T Burgis
  • 1,395
  • 7
  • 9