0

Given a set of numbers is there any algorithm or methods available to split them in to different groups and count them ?

something like :

input : [1,2,3,4,5,100,200,1000,2500,3000]
output : 1-5         : 5
         100 -200    : 2   
         1000 - 3000 : 3

input : [1,1,2,3,4,5,6,7,8,9,10,11,15,75,80]
output : 1 - 15   : 13
         75 - 80  : 2  

input : [1,100,1000]
output : 1    : 1
         100  : 1
         1000 : 1

Say number of groups should be minimum 2 to maximum 10. How this can be done ?

Sreejithc321
  • 297
  • 3
  • 19

3 Answers3

0

You need some kind of clusterization. With limited numbers of 2..10 groups (clusters) k-means looks a good choice.

And you definitely need some metrics depending on numbers value (because simple difference is not suitable to divide 1,2,3 and 100,200 into different clusters). May be, value logarithm?

MBo
  • 77,366
  • 5
  • 53
  • 86
  • Is there any another method than Kmeans ? I need this calculation to run on near real time. – Sreejithc321 May 20 '16 at 07:38
  • Yes, there is a lot of them https://en.wikipedia.org/wiki/Cluster_analysis, kmeans is just the most known and implementations are wide available – MBo May 20 '16 at 07:44
0

This is the kind of problem where machine learning is helpful. Here is a simple and nice solution for this problem: Clustering values by their proximity in python (machine learning?). It's using numpy and sklearn which needs to be installed first.

Community
  • 1
  • 1
smaftoul
  • 2,375
  • 17
  • 14
0

The task you are asking is a bit ambiguous, since the criterion of grouping is not well defined.

Given that in the set there are at least two different numbers, I would propose the following approach:

  1. find a span of numbers
  2. define boundaries of 10 non overlaping bins covering the span such that the minimal and maximal elements fall into different bins
  3. group numbers into bins
  4. discard empty bins (at least 2 bins will remain since minimal and maximal numbers are in different bins)
  5. investigate content of the remaining bins and print your report

Of course the groups you obtain that way would be more or less arbitrary. If you want to avoid grouping like that:

input : [1,1,2,3,4,5,6,7,8,9,10,11,15,75,80] output : 1 - 8 : 9 9 - 15 : 4 75 - 80 : 2 then you should:

  1. define criterion of goodness of your clusters
  2. look for a suitable clustering algorithm
abukaj
  • 2,582
  • 1
  • 22
  • 45