3

Let's say we have an array of integers or even continuous stream of integers. The idea is to print unique elements in descending order based on occurrence frequencies. For example, for 7, 4, 2, 4, 9, 6, 5, 6, 2, 0, 2, 1 we should get: 2, 4, 6, 7, 9, 5, 0, 1 as 2 appears three times, 4 and 6 two time and the rest just one time.

Is there any better and efficient approach than (sorting map based by value) counting the occurances of elements , storing them in map and then sort the map based on value.?

nits.kk
  • 5,204
  • 4
  • 33
  • 55
whiteErru
  • 275
  • 1
  • 5
  • 13
  • what programming language? – RomanPerekhrest Mar 23 '17 at 11:31
  • Let's assume that it's Java, however this is more to find an effective algorithm than using programming language power and libraries as at the end programming languages use smart algorithms. – whiteErru Mar 23 '17 at 11:53
  • the obvious choice would be a sorted map as you already mentioned, but it might also be worth it to look into using a full fledged RDBMS and use the power of database indexing, depending on the size of your data that should also work for a streaming list of values and you don't have to worry about out of memory problems with in memory maps etc. – xander Mar 23 '17 at 12:14

3 Answers3

2

However, it seems to me that there should be much effective algorithm for this, as probably there is a way for sorting frequencies on fly.

This problem is actually Omega(nlogn) in the algebraic tree model (hashing is not allowed under that model), with reduction from Element Distinctness Problem, so if you were hoping to get a one (or constant number) of iterations to solve it, without any auxillary data structure under the hood to solve it - that's impossible, as it will allow us to solve element distinctness in O(n) in Algebraic tree model.

The canonical solutions for these types of problems are:

  1. (Your suggestion): Build a map, remove duplicates and sort by frequency
  2. Similar to 1, but instead of a map - sort the items, and finding the number of times each element repeats is easy using binary search.
Community
  • 1
  • 1
amit
  • 175,853
  • 27
  • 231
  • 333
0

If you are using Python then use the collections.Counter class and its Counter.most_common() method

from collections import Counter
counted = Counter([7, 4, 2, 4, 9, 6, 5, 6, 2, 0, 2, 1])
ordered = [value for value, count in counted.most_common()]
print(ordered)  # [2, 4, 6, 0, 1, 5, 7, 9]

The source is available showing a heapq being used in most_common()

Paddy3118
  • 4,704
  • 27
  • 38
0

Concentrating on the requirement in the question of possibly accepting the input as a stream of integers, one solution would be a modified insertion sort...

Create a list (herein named countlist) of lists, initially empty. Each time you accept a new value, iterate descending through each list in countlist, looking for the value. If you find the value in countlist[i], remove the value from its current list and insert it into countlist[i+1], adding a new list to countlist if necessary. If you never find the value, insert the value into countlist[1].

The purpose of iterating descending instead of ascending is that it allows you to maintain a pointer to where the value will be inserted into countlist[i] if you find it in countlist[i-1]. If you do not need a sub-sort on the values that share the same count, you can skip this pointer and iterate ascending.

I believe this algorithm is O(n2) overall. Processing each new value is O(n) and results are kept sorted as you go. At any point, you can print the correct order (so far) by iterating descending through countlist and printing each list.

Sample run, using the example in the question...

input: 7, 4, 2, 4, 9, 6, 5, 6, 2, 0, 2, 1

After accepting 7:
countlist[1] = 7

After accepting 4:
countlist[1] = 4, 7

After accepting 2:
countlist[1] = 2, 4, 7

After accepting 4:
countlist[1] = 2, 7
countlist[2] = 4

After accepting 9:
countlist[1] = 2, 7, 9
countlist[2] = 4

After accepting 6:
countlist[1] = 2, 6, 7, 9
countlist[2] = 4

After accepting 5:
countlist[1] = 2, 5, 6, 7, 9
countlist[2] = 4

After accepting 6:
countlist[1] = 2, 5, 7, 9
countlist[2] = 4, 6

After accepting 2:
countlist[1] = 5, 7, 9
countlist[2] = 2, 4, 6

After accepting 0:
countlist[1] = 0, 5, 7, 9
countlist[2] = 2, 4, 6

After accepting 2:
countlist[1] = 0, 5, 7, 9
countlist[2] = 4, 6
countlist[3] = 2

After accepting 1:
countlist[1] = 0, 1, 5, 7, 9
countlist[2] = 4, 6
countlist[3] = 2
  • Interesting approach, but I believe sorting a map by value would be O(nlogn), which is better than O(n^2). – whiteErru Mar 23 '17 at 20:45
  • Agreed. I was focusing on the case (mentioned by the question) where the input is a stream of integers. So, I constructed the algorithm to not require knowing how many values we will get, and to not require "re-sorting" after accepting each input. – James Droscha Mar 23 '17 at 20:57