Most frequent values in a dictionary

Question

I have the following dictionary:

d = {"a":["MRS","VAL"],"b":"PRS","c":"MRS","d":"NTS"}

I would like to create a dictionary which gives the occurence of each values. Basically, it would look like:

output = {"MRS":2,"PRS":1,"NTS":1,"VAL":1}

Does anyone know how I could do that ? Thanks in advance !

The structure of your dictionary is weird. Why are the values not always in lists? This makes it more difficult to handle. `d = {"a":["MRS","VAL"], "b":["PRS"], "c":["MRS"], "d":["NTS"]}` would be preferable. — Tim Pietzcker, Nov 24 '15 at 18:09

score 8 · Answer 1 · edited May 23 '17 at 12:15

8

Since your dict is composed of both strings and lists of strings, you first need to flatten those elements to a common type of string:

import collections
d = {"a":["MRS","VAL"],"b":"PRS","c":"MRS","d":"NTS"}

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
            for sub in flatten(el):
                yield sub
        else:
            yield el

>>> list(flatten(d.values()))
['MRS', 'VAL', 'MRS', 'PRS', 'NTS']

Then you can then use a Counter to count the occurrences of each string:

>>> collections.Counter(flatten(d.values())) 
Counter({'MRS': 2, 'NTS': 1, 'PRS': 1, 'VAL': 1})

edited May 23 '17 at 12:15

Community

1
1

answered Nov 24 '15 at 18:10

dawg

98,345
23
131
206

1

[Link to flatten recipe using hasattr.](http://code.activestate.com/recipes/577255-flatten-a-list-or-list-of-lists-etc/) Are there any advantages to checking whether it's a list or tuple instead of just iterable? The most common thing that your version would exclude would be `set`, I guess. – Cody Piersall Nov 24 '15 at 18:15

score 4 · Answer 2 · answered Nov 24 '15 at 18:22

As already posted you can possibly use collections.Counter as it is an obvious approach or else you can either use itertools.groupby or a combination of itertools.groupby and collections.Counter

Just itertools.groupby

>>> from itertools import groupby
>>> a, b = [list(g) for _,  g in groupby(d.values(), type)]
>>> {k: len(list(g)) for k, g in groupby(sorted(a[0] + b))}
{'NTS': 1, 'VAL': 1, 'PRS': 1, 'MRS': 2}

itertools.groupby and collections.Counter

>>> from itertools import groupby
>>> a, b = [list(g) for _,  g in groupby(d.values(), type)]
>>> dict(Counter(a[0] + b))
{'NTS': 1, 'VAL': 1, 'PRS': 1, 'MRS': 2}

This Just does the Job for the problem OP has though not robust.

score 1 · Answer 3 · edited May 23 '17 at 11:44

In general, you can use a Counter to map keys to counts - it's essentially a multiset.

Since your dict is multi-dimensional you'll have to do a little transforming, but if you simply iterate over every value and sub-value in your dict and add it to a Counter instance, you'll get what you want.

Here's a first-pass implementation; depending on exactly what d will contain you may have to tweak it a bit:

counts = Counter()
for elem in d.values():
  if isinstance(obj, Iterable) and not isinstance(elem, types.StringTypes):
    for sub_elem in elem:
      counter.add(sub_elem)
  else:
    counter.add(elem)

Notice that we check if elem is an iterable and not a string. Python doesn't make distinguishing between strings and collections easy, so if you know d will contain only strings and lists (for instance) you can simply do isinstance(elem, list) and so on. If you can't guarantee the values of d will all be lists (or tuples, or so on) it's better to explicitly exclude strings.

Also, if d could contain recursive keys (e.g. a list containing lists containing strings) this won't be sufficient; you'll likely want to write a recursive function to flatten everything, like dawg's solution.

Hai Vu · Answer 4 · 2015-11-24T21:24:28.543

I am lazy, so I am going to use library functions to get the job done for me:

import itertools
import collections

d = {"a": ["MRS", "VAL"], "b": "PRS", "c": "MRS", "d": "NTS"}
values = [[x] if isinstance(x, basestring) else x for x in d.values()]
counter = collections.Counter(itertools.chain.from_iterable(values))
print counter
print counter['MRS']  # Sampling

Output:

Counter({'MRS': 2, 'NTS': 1, 'PRS': 1, 'VAL': 1})
2

At the end, counter acts like the dictionary you want.

Explanation

Consider this line:

values = [[x] if isinstance(x, basestring) else x for x in d.values()]

Here, I turned every value in the dictionary d into a list to make processing easier. values might look something like the following (order might be different, which is fine):

# values = [['MRS', 'VAL'], ['MRS'], ['PRS'], ['NTS']]

Next, the expression:

itertools.chain.from_iterable(values)

returns a generator which flatten the list, conceptually, the list now looks like this:

['MRS', 'VAL', 'MRS', 'PRS', 'NTS']

Finally, the Counter class takes that list and count, so we ended up with the final result.

score 0 · Answer 5 · answered Nov 24 '15 at 18:54

You can do it, with just built-in function, this way:

>>> d = {"a":["MRS","VAL"],"b":"PRS","c":"MRS","d":"NTS"}
>>> 
>>> flat = []
>>> for elem in d.values():
    if isinstance(elem, list):
        for sub_elem in elem:
            flat.append(sub_elem)
    else:
        flat.append(elem)


>>> flat
['MRS', 'VAL', 'MRS', 'PRS', 'NTS']
>>> 
>>> output = {}
>>> 
>>> for item in flat:
    output[item] = flat.count(item)
>>>
>>> output
{'NTS': 1, 'PRS': 1, 'VAL': 1, 'MRS': 2}

Most frequent values in a dictionary

5 Answers5

Explanation