2

I have a list containing dictionaries:

[{'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb4340', 'y': u'osgb4000'},
 {'x': u'osgb4020', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'}]

I wish to count the incidents of each dict and create a new field count

The desired outcome looks like this:

[{'x': u'osgb32', 'y': u'osgb4000', 'count': 3},
 {'x': u'osgb4340', 'y': u'osgb4000', 'count': 1},
 {'x': u'osgb4020', 'y': u'osgb4000', 'count': 1}]

I am unsure how to match dicts.

martineau
  • 119,623
  • 25
  • 170
  • 301
LearningSlowly
  • 8,641
  • 19
  • 55
  • 78

4 Answers4

3

You can achieve that easily with code below

items = [{'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb4340', 'y': u'osgb4000'},
 {'x': u'osgb4020', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'}]

result = {}
counted_items = []
for item in items:
    key = item['x'] + '_' + item['y']
    result[key] = result.get(key, 0) + 1

for key, value in result.iteritems():
    y, x = key.split('_')
    counted_items.append({'x': x, 'y': y, 'count': value})

print counted_items # [{'y': u'osgb32', 'x': u'osgb4000', 'count': 3}, {'y': u'osgb4340', 'x': u'osgb4000', 'count': 1}, {'y': u'osgb4020', 'x': u'osgb4000', 'count': 1}]

Another option is to use counter. There are plenty of answers of how to dial with collections.Counter :)

Good Luck!

Andriy Ivaneyko
  • 20,639
  • 6
  • 60
  • 82
3

This is a job for collections.Counter. But first you have to convert your dicts to actual tuples, as dicts are not hashable and thus can not be used as keys in a Counter object:

>>> dicts = [{'x': u'osgb32', 'y': u'osgb4000'},
...          {'x': u'osgb4340', 'y': u'osgb4000'},
...          {'x': u'osgb4020', 'y': u'osgb4000'},
...          {'x': u'osgb32', 'y': u'osgb4000'},
...          {'x': u'osgb32', 'y': u'osgb4000'}]
>>> collections.Counter(tuple(d.items()) for d in dicts)
Counter({(('y', u'osgb4000'), ('x', u'osgb32')): 3, 
         (('y', u'osgb4000'), ('x', u'osgb4020')): 1, 
         (('y', u'osgb4000'), ('x', u'osgb4340')): 1})

Then, you can turn those back into dicts with the added "count" key:

>>> c = collections.Counter(tuple(d.items()) for d in dicts)
>>> [dict(list(k) + [("count", c[k])]) for k in c]
[{'count': 1, 'x': u'osgb4020', 'y': u'osgb4000'},
 {'count': 3, 'x': u'osgb32', 'y': u'osgb4000'},
 {'count': 1, 'x': u'osgb4340', 'y': u'osgb4000'}]
tobias_k
  • 81,265
  • 12
  • 120
  • 179
  • Wouldn't you need to sort the tuples since nothing guarantees that two identical dicts return items in same order? – niemmi Jun 10 '16 at 08:24
  • @niemmi Hm, good point... but I think that if all those dicts have the same keys, they should come out in the same order. Using `frozenset` might indeed be better, though. – tobias_k Jun 10 '16 at 08:26
3

You can use Counter and frozenset for this:

from collections import Counter

l = [{'x': u'osgb32', 'y': u'osgb4000'},
    {'x': u'osgb4340', 'y': u'osgb4000'},
    {'x': u'osgb4020', 'y': u'osgb4000'},
    {'x': u'osgb32', 'y': u'osgb4000'},
    {'x': u'osgb32', 'y': u'osgb4000'}]

c = Counter(frozenset(d.items()) for d in l)
[dict(k, count=v) for k, v in c.items()] # [{'y': u'osgb4000', 'x': u'osgb4340', 'count': 1}, {'y': u'osgb4000', 'x': u'osgb32', 'count': 3}, {'y': u'osgb4000', 'x': u'osgb4020', 'count': 1}]
niemmi
  • 17,113
  • 7
  • 35
  • 42
2

You can pass your list of dicts as the data arg to DataFrame ctor:

In [74]:
import pandas as pd
data = [{'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb4340', 'y': u'osgb4000'},
 {'x': u'osgb4020', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'},
 {'x': u'osgb32', 'y': u'osgb4000'}]
df = pd.DataFrame(data)
df

Out[74]:
          x         y
0    osgb32  osgb4000
1  osgb4340  osgb4000
2  osgb4020  osgb4000
3    osgb32  osgb4000
4    osgb32  osgb4000

you can then groubpy on the cols and call size to get a count:

In [76]:    
df.groupby(['x','y']).size()

Out[76]:
x         y       
osgb32    osgb4000    3
osgb4020  osgb4000    1
osgb4340  osgb4000    1
dtype: int64

and then call to_dict:

In [77]:    
df.groupby(['x','y']).size().to_dict()

Out[77]:
{('osgb32', 'osgb4000'): 3,
 ('osgb4020', 'osgb4000'): 1,
 ('osgb4340', 'osgb4000'): 1}

You can wrap the above into a list:

In [79]:
[df.groupby(['x','y']).size().to_dict()]

Out[79]:
[{('osgb32', 'osgb4000'): 3,
  ('osgb4020', 'osgb4000'): 1,
  ('osgb4340', 'osgb4000'): 1}]

You can reset_index, rename the column and pass arg orient='records':

In [94]:
df.groupby(['x','y']).size().reset_index().rename(columns={0:'count'}).to_dict(orient='records')

Out[94]:
[{'count': 3, 'x': 'osgb32', 'y': 'osgb4000'},
 {'count': 1, 'x': 'osgb4020', 'y': 'osgb4000'},
 {'count': 1, 'x': 'osgb4340', 'y': 'osgb4000'}]
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 1
    you should mention that you are using pandas to do the work. Therefore your code is missing an import statement: ```import pandas as pd``` – wagnerpeer Jun 10 '16 at 08:07
  • @pwagner sure have updated, the question was originally tagged `pandas` – EdChum Jun 10 '16 at 08:08
  • Thanks Ed. Is there a way to include the key in the output? desired outcome - `{'x': 'osgb32', 'y': 'osgb4000', 'count' : 3}` – LearningSlowly Jun 10 '16 at 08:09