Counting matching dictionaries

Question

I have a list containing dictionaries:

[{'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb4340', 'y': u'osgb4000'},
 {'x': u'osgb4020', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'}]

I wish to count the incidents of each dict and create a new field count

The desired outcome looks like this:

[{'x': u'osgb32', 'y': u'osgb4000', 'count': 3},
 {'x': u'osgb4340', 'y': u'osgb4000', 'count': 1},
 {'x': u'osgb4020', 'y': u'osgb4000', 'count': 1}]

I am unsure how to match dicts.

Are you sure you have a list of _tuples_? Tuples look like this: `(item, item)`, while dictionaries look like this: `{key:value, key:value}`. So you have a list of dicts — illright, Jun 10 '16 at 08:00
Maybe [this link](http://stackoverflow.com/questions/2600191/how-can-i-count-the-occurrences-of-a-list-item-in-python) could help. Try using `collections.Counter` — Stjepan B, Jun 10 '16 at 08:07
@Levay `counter` is from 2.7 https://docs.python.org/2/library/collections.html#collections.Counter — Hooting, Jun 10 '16 at 08:18

Andriy Ivaneyko · Answer 1 · 2016-06-10T10:37:07.693

You can achieve that easily with code below

items = [{'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb4340', 'y': u'osgb4000'},
 {'x': u'osgb4020', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'}]

result = {}
counted_items = []
for item in items:
    key = item['x'] + '_' + item['y']
    result[key] = result.get(key, 0) + 1

for key, value in result.iteritems():
    y, x = key.split('_')
    counted_items.append({'x': x, 'y': y, 'count': value})

print counted_items # [{'y': u'osgb32', 'x': u'osgb4000', 'count': 3}, {'y': u'osgb4340', 'x': u'osgb4000', 'count': 1}, {'y': u'osgb4020', 'x': u'osgb4000', 'count': 1}]

Another option is to use counter. There are plenty of answers of how to dial with collections.Counter :)

Good Luck!

score 3 · Answer 2 · answered Jun 10 '16 at 08:18

This is a job for collections.Counter. But first you have to convert your dicts to actual tuples, as dicts are not hashable and thus can not be used as keys in a Counter object:

>>> dicts = [{'x': u'osgb32', 'y': u'osgb4000'},
...          {'x': u'osgb4340', 'y': u'osgb4000'},
...          {'x': u'osgb4020', 'y': u'osgb4000'},
...          {'x': u'osgb32', 'y': u'osgb4000'},
...          {'x': u'osgb32', 'y': u'osgb4000'}]
>>> collections.Counter(tuple(d.items()) for d in dicts)
Counter({(('y', u'osgb4000'), ('x', u'osgb32')): 3, 
         (('y', u'osgb4000'), ('x', u'osgb4020')): 1, 
         (('y', u'osgb4000'), ('x', u'osgb4340')): 1})

Then, you can turn those back into dicts with the added "count" key:

>>> c = collections.Counter(tuple(d.items()) for d in dicts)
>>> [dict(list(k) + [("count", c[k])]) for k in c]
[{'count': 1, 'x': u'osgb4020', 'y': u'osgb4000'},
 {'count': 3, 'x': u'osgb32', 'y': u'osgb4000'},
 {'count': 1, 'x': u'osgb4340', 'y': u'osgb4000'}]

Wouldn't you need to sort the tuples since nothing guarantees that two identical dicts return items in same order? — niemmi, Jun 10 '16 at 08:24
@niemmi Hm, good point... but I think that if all those dicts have the same keys, they should come out in the same order. Using `frozenset` might indeed be better, though. — tobias_k, Jun 10 '16 at 08:26

score 3 · Answer 3 · answered Jun 10 '16 at 08:19

You can use Counter and frozenset for this:

from collections import Counter

l = [{'x': u'osgb32', 'y': u'osgb4000'},
    {'x': u'osgb4340', 'y': u'osgb4000'},
    {'x': u'osgb4020', 'y': u'osgb4000'},
    {'x': u'osgb32', 'y': u'osgb4000'},
    {'x': u'osgb32', 'y': u'osgb4000'}]

c = Counter(frozenset(d.items()) for d in l)
[dict(k, count=v) for k, v in c.items()] # [{'y': u'osgb4000', 'x': u'osgb4340', 'count': 1}, {'y': u'osgb4000', 'x': u'osgb32', 'count': 3}, {'y': u'osgb4000', 'x': u'osgb4020', 'count': 1}]

EdChum · Accepted Answer · 2016-06-10T08:13:38.920

You can pass your list of dicts as the data arg to DataFrame ctor:

In [74]:
import pandas as pd
data = [{'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb4340', 'y': u'osgb4000'},
 {'x': u'osgb4020', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'}]
df = pd.DataFrame(data)
df

Out[74]:
          x         y
0    osgb32  osgb4000
1  osgb4340  osgb4000
2  osgb4020  osgb4000
3    osgb32  osgb4000
4    osgb32  osgb4000

you can then groubpy on the cols and call size to get a count:

In [76]:    
df.groupby(['x','y']).size()

Out[76]:
x         y       
osgb32    osgb4000    3
osgb4020  osgb4000    1
osgb4340  osgb4000    1
dtype: int64

and then call to_dict:

In [77]:    
df.groupby(['x','y']).size().to_dict()

Out[77]:
{('osgb32', 'osgb4000'): 3,
 ('osgb4020', 'osgb4000'): 1,
 ('osgb4340', 'osgb4000'): 1}

You can wrap the above into a list:

In [79]:
[df.groupby(['x','y']).size().to_dict()]

Out[79]:
[{('osgb32', 'osgb4000'): 3,
  ('osgb4020', 'osgb4000'): 1,
  ('osgb4340', 'osgb4000'): 1}]

You can reset_index, rename the column and pass arg orient='records':

In [94]:
df.groupby(['x','y']).size().reset_index().rename(columns={0:'count'}).to_dict(orient='records')

Out[94]:
[{'count': 3, 'x': 'osgb32', 'y': 'osgb4000'},
 {'count': 1, 'x': 'osgb4020', 'y': 'osgb4000'},
 {'count': 1, 'x': 'osgb4340', 'y': 'osgb4000'}]

you should mention that you are using pandas to do the work. Therefore your code is missing an import statement: ```import pandas as pd``` — wagnerpeer, Jun 10 '16 at 08:07
@pwagner sure have updated, the question was originally tagged `pandas` — EdChum, Jun 10 '16 at 08:08
Thanks Ed. Is there a way to include the key in the output? desired outcome - `{'x': 'osgb32', 'y': 'osgb4000', 'count' : 3}` — LearningSlowly, Jun 10 '16 at 08:09

Counting matching dictionaries

4 Answers4