0

I have labels and their frequencies(ie number of times they are repeated) for a dataset.

Is there a library which can be used to group together those labels which have almost similar frequency(ie based on variation).

As an example: Suppose a is repeated 10 times, b 9 times, c 6 times, d 5 times, e 2 times So I want and b fall into one group, c and d in one group and e in another group.

Mandroid
  • 6,200
  • 12
  • 64
  • 134

1 Answers1

0

You can use the following function to group based upon count.

def group_labels(cnts): 
  d = {} 
  for k, v in cnts.items(): 
    d.setdefault(v, []).append(k)
  return sorted(d.values(), key=lambda x: x[0]) # sorted by first label

Example

cnts = {'a': 4, 'b': 15, 'c':4, 'd':16, 'e':1, 'f':16}
print(group_labels(cnts))
[['a', 'c'], ['b'], ['d', 'f'], ['e']]
DarrylG
  • 16,732
  • 2
  • 17
  • 23
  • Thanks for input. I actually need is to group those elements which fall within a range OR difference between them is within a given limit. – Mandroid Nov 05 '19 at 02:56
  • 1
    @Mandroid--is your goal to cluster labels based upon a value [similar to this problem](https://stackoverflow.com/questions/18364026/clustering-values-by-their-proximity-in-python-machine-learning)? – DarrylG Nov 05 '19 at 04:12
  • Exactly. Thanks a lot. – Mandroid Nov 05 '19 at 04:33