1

I have a document that consists of labels like this:

201
202
205
201
203
204
201

If I wish to count the number of occurrences of each label and print it out, how can I do so in python?

What I am trying to get is:

201: 3
202: 1
204: 1 
Iron Fist
  • 10,739
  • 2
  • 18
  • 34
minks
  • 2,859
  • 4
  • 21
  • 29

3 Answers3

5

Use a Counter from collections module to map key as strings with their counts

>>> from collections import Counter
>>> 
>>> s
'202\n205\n201\n203\n204\n201\n'
>>> s = '''
201
202
205
201
203
204
201
'''
>>> c=Counter()
>>> for d in s.rstrip().split():
        c[d] += 1


>>> c
Counter({'201': 3, '205': 1, '204': 1, '203': 1, '202': 1})

Or as suggested by Kevin Guan:

>>> c = Counter(s.rstrip().split())

EDIT:

I think this can be further simply done, this way:

>>> l = s.rstrip().split()
>>> l
['201', '202', '205', '201', '203', '204', '201']
>>> c = [l.count(x) for x in l]
>>> 
>>> c
[1, 1, 1, 3, 1]
>>> 
>>> d = dict(zip(l,c))
>>> 
>>> d
{'205': 1, '201': 3, '203': 1, '204': 1, '202': 1}

And if you are fun of one liner expression, then:

>>> l = s.rstrip().split()
>>>
>>> dict(zip(l,map(l.count, l)))
{'205': 1, '204': 1, '201': 3, '203': 1, '202': 1}
>>>
>>> dict(zip(set(l),map(l.count, set(l))))
{'205': 1, '201': 3, '203': 1, '204': 1, '202': 1}
Community
  • 1
  • 1
Iron Fist
  • 10,739
  • 2
  • 18
  • 34
2

Try this:

import itertools

with open("your_document") as f:
    lines = sorted(map(str.int, f.read().strip().split()))
    for x,y in itertools.groupby(lines):
        print x, list(y)

if your document is huge like in Gb's

import collections
my_dict = collections.defaultdict(int)
with open("your_document") as f:
    for line in f:
        my_dict[line] += 1

Output:

>>> my_dict
defaultdict(<type 'int'>, {'201': 2, '203': 1, '202': 1, '205': 1, '204': 1})

without collections or itertools:

my_dict = {}
with open("your_document") as f:
    for line in f:
        line = line.strip()
        my_dict[line] = my_dict.get(line, 0) + 1
Hackaholic
  • 19,069
  • 5
  • 54
  • 72
0

You can use readlines() method to return the list of lines then use Counter from the collections module to return the count for each "label".

>>> with open('text.txt') as f:
...     c = Counter(map(str.strip, f.readlines()))
...     print(c)
... 
Counter({'201': 3, '205': 1, '202': 1, '204': 1, '203': 1})
styvane
  • 59,869
  • 19
  • 150
  • 156