2

I'm new to python and I'm trying to write a function whose description is as follows: I have a list of integers. From this list I have to find the item with max frequency and print that. This seems pretty straight forward unless I have a limitation that the function must complete the execution within 10 sec and should consume memory < 512 MB. For shorter list length my function works fine but for a list of length 100000 it tanks. I'm not able to optimize the code. I have 2 implementations for the same:

Implementation #1

def returnMaxFrequency(ar):
    freqList = []
    for val in ar:
        freq = ar.count(val)
        freqList.append(freq)
    return(max(freqList))

Implementation #2

def returnMaxFrequency(ar):   
    freqDict = {x:ar.count(x) for x in ar}   
    maxFreq = max(freqDict.values())
    return maxFreq

Eg

if ar = [3 2 1 3]
o/p: 2

Using NumPy is not an option here. (Can't use external package)

noobi
  • 97
  • 2
  • 6

6 Answers6

4

Easiest (and reasonably fast) is likely the built-in Counter:

from collections import Counter
winner = Counter(ar).most_common(1)[0]

An even faster method (and using no extra memory, but destroying the original array) is given in this article, replicated here:

# Python program to find the maximum repeating number 

# Returns maximum repeating element in arr[0..n-1]. 
# The array elements are in range from 0 to k-1 
def maxRepeating(arr, n,  k): 

    # Iterate though input array, for every element 
    # arr[i], increment arr[arr[i]%k] by k 
    for i in range(0,  n): 
        arr[arr[i]%k] += k 

    # Find index of the maximum repeating element 
    max = arr[0] 
    result = 0
    for i in range(1, n): 

        if arr[i] > max: 
            max = arr[i] 
            result = i 

    # Uncomment this code to get the original array back 
    #for i in range(0, n): 
    #    arr[i] = arr[i]%k 

    # Return index of the maximum element 
    return result 

(Parts of this code could be replaced by more performant alternates, in particular using the max function instead of the second loop.)

Amadan
  • 191,408
  • 23
  • 240
  • 301
3

Both of your implementations are basically the same, the second one only uses a list comprehension instead of the for loop. Both algorithms are in O(n^2) because count is in O(n) and you are calling it n times (once for each value).

If you want to optimize, reduce the complexity (to O(n)):

def returnMaxFrequency(ar):   
    freqDict = {x:0 for x in ar}
    for val in ar:
        freqDict[val] = freqDict[val] + 1
    maxFreq = max(freqDict.values())
    return maxFreq
kutschkem
  • 7,826
  • 3
  • 21
  • 56
3

Hope this helps!

We are using Python's High-Performance container datatypes(Counter)

from collections import Counter

def returnMaxFrequency(ar):
    return max(Counter(t).values())

Counter does a frequency mapping of your number and created a dict, once dict is created you are using a max to get max-freq of the list.

Using Dict is an efficient way of generating a frequency count, unless your are going for distributed compute solutions

Note: collections is a python in-built package i.e. comes with setup. Not an external library.

sam
  • 1,819
  • 1
  • 18
  • 30
  • He wrote he can't even use `numpy` as he "Can't use external package" – Aryerez Oct 23 '19 at 07:04
  • 1
    @Aryerez - It's in-built package. Here is a link for your reference - https://docs.python.org/2/library/collections.html – sam Oct 23 '19 at 07:07
  • Regarding "High-Performance container datatypes": `Counter` is implemented in Python, and is no more performant than the `dict` solution, just more readable and succint. It is way faster than any solution involving `count`, though. – Amadan Oct 23 '19 at 07:14
  • @Amadan - thanks! can you prove if `dict` has faster performance than `Counter`? – sam Oct 23 '19 at 07:21
  • `Counter` uses `dict` ([source](https://github.com/python/cpython/blob/master/Lib/collections/__init__.py#L472)). Thus, a solution using `Counter` is doing almost the same thing as the solution using `dict`, just in one case it's you who needs to write the counting code, and in the other it's the Python developers who've already done it for you. (`Counter` would be a little bit slower because it does a bit more checking of parameter correctness and types which a custom `dict` solution would not be doing.) – Amadan Oct 23 '19 at 07:25
  • 1
    @Amadan I just reran the timings given in [this answer](https://stackoverflow.com/a/6988979/1025391) and it seems `Counter` is actually faster than solutions based on `dict` and `defaultdict` for recent Python versions (tested this with v3.6.7). – moooeeeep Oct 23 '19 at 07:50
  • For reference: [tio](https://tio.run/##rZLBcsIgEIbveQomJ6g0k9hEZzqTU6@@gXUyEdEyBkiBdJqnTxfQaquHHhq48P@7y8dm@9G9afX0aF3Ljh23dpqE7LVxyI422RstEdNdx5kTWll08l70oBw3FO34vh06txPMxWABstO6@w49GD302zG6uuemddqcTYiWB@4gJ/pOSC7c2Y2nJFmhGq0LOqdlRSvYtLxaVVktKOwFXdLFcvNQ5HmeJMCFZPvZaMYGY7hi3DYFtvy9XpHnBMGXeuaA6yFsGkQGN3kdk3Dce1JwkVAIcmNiiFt7eQPRLAN@7E8U5QTNUBGCDHeDUR4BsyxcgAlFRz7WlyfjgpD7pPNfpJcm3wW@2Fgo90f0Wf0vqOVPVBYHI72ufBoWvCKZ1NY1TEupFZRc55sk@eDGhtGCf3xTnN625lYqoEpv4OUYRjY71aMofVUpQPtGeM034nxXhI05XsuaZqdZ09AQ4Z8fRy@YFKlBbrmp53lOyDR9AQ) – moooeeeep Oct 23 '19 at 07:55
  • @moooeeeep: My bad — I just saw a C helper function that Python loads "if available" (`_collections._count_elements`). Without it, it does the same as I said before, but with it, it is indeed faster. This is the only sped-up part, so my eyes just skipped over it. – Amadan Oct 23 '19 at 08:01
  • @moooeeeep, Amadan - Thanks to you. I learned something new today. – sam Oct 23 '19 at 08:40
1

Returns most frequently occurring value in list

max(set(ar), key=ar.count) 
AnswerSeeker
  • 288
  • 1
  • 10
0

What about this?:

max(ar.count(i) for i in ar)

Or this?:

max(map(ar.count,ar))
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52
0

The second implement is nice, but inside the dict-comprehension change ar to set(ar), and it will check every item only once:

def returnMaxFrequency(ar):   
    freqDict = {x:ar.count(x) for x in set(ar)}   
    maxFreq = max(freqDict.values())
    return maxFreq
Aryerez
  • 3,417
  • 2
  • 9
  • 17