Find count of characters within the string in Python

Question

I am trying to create a dictionary of word and number of times it is repeating in string. Say suppose if string is like below

str1 = "aabbaba"

I want to create a dictionary like this

word_count = {'a':4,'b':3}

I am trying to use dictionary comprehension to do this. I did

dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}

This ends up giving an error saying

  File "<stdin>", line 1
    dic = {x:dic[x]+1 if x in dic.keys() else x:1 for x in str}
                                               ^
SyntaxError: invalid syntax

Can anybody tell me what's wrong with the syntax? Also,How can I create such a dictionary using dictionary comprehension?

remove the second x: the first x: is the key for both and the if clause is parsed as part of the value — Erotemic, Dec 03 '16 at 18:10
@dawg I know counter. I don't want to use counter. I want to do this using dictionary comprehension if possible. — Chiyaan Suraj, Dec 03 '16 at 18:29
dict/list/set comprehension are cool and everything but that don't means that they are the optimal solution for everything, this is one of those cases — Copperfield, Dec 03 '16 at 18:52
Please have a look at [this answer](http://stackoverflow.com/questions/26731675/python-count-occurrences-in-a-list-using-dict-comprehension-generator) as well — dawg, Dec 03 '16 at 19:31

dawg · Accepted Answer · 2016-12-03T18:58:39.700

As others have said, this is best done with a Counter.

You can also do:

>>> {e:str1.count(e) for e in set(str1)}
{'a': 4, 'b': 3}

But that traverses the string 1+n times for each unique character (once to create the set, and once for each unique letter to count the number of times it appears. i.e., This has quadratic runtime complexity.). Bad result if you have a lot of unique characters in a long string... A Counter only traverses the string once.

If you want no import version that is more efficient than using .count, you can use .setdefault to make a counter:

>>> count={}
>>> for c in str1:
...    count[c]=count.setdefault(c, 0)+1
... 
>>> count
{'a': 4, 'b': 3}

That only traverses the string once no matter how long or how many unique characters.

You can also use defaultdict if you prefer:

>>> from collections import defaultdict
>>> count=defaultdict(int)
>>> for c in str1:
...    count[c]+=1
... 
>>> count
defaultdict(<type 'int'>, {'a': 4, 'b': 3})
>>> dict(count)
{'a': 4, 'b': 3}

But if you are going to import collections -- Use a Counter!

Why not just `count = collections.defaultdict(int)` once instead of `setdefault` on every iteration? — Michael Kohl, Dec 03 '16 at 18:31
@MichaelKohl to avoid imports? otherwise counter is the better option — Copperfield, Dec 03 '16 at 18:34

score 9 · Answer 2 · edited May 23 '17 at 12:09

Ideal way to do this is via using collections.Counter:

>>> from collections import Counter
>>> str1 = "aabbaba"
>>> Counter(str1)
Counter({'a': 4, 'b': 3})

You can not achieve this via simple dict comprehension expression as you will require reference to your previous value of count of element. As mentioned in Dawg's answer, as a work around you may use list.count(e) in order to find count of each element from the set of string within you dict comprehension expression. But time complexity will be n*m as it will traverse the complete string for each unique element (where m are uniques elements), where as with counter it will be n.

Roman Pekar · Answer 3 · 2016-12-03T18:18:20.640

This is a nice case for collections.Counter:

>>> from collections import Counter
>>> Counter(str1)
Counter({'a': 4, 'b': 3})

It's dict subclass so you can work with the object similarly to standard dictionary:

>>> c = Counter(str1)
>>> c['a']
4

You can do this without use of Counter class as well. The simple and efficient python code for this would be:

>>> d = {}
>>> for x in str1:
...     d[x] = d.get(x, 0) + 1
... 
>>> d
{'a': 4, 'b': 3}

sirfz · Answer 4 · 2016-12-03T19:06:14.467

2

Note that this is not the correct way to do it since it won't count repeated characters more than once (apart from losing other characters from the original dict) but this answers the original question of whether if-else is possible in comprehensions and demonstrates how it can be done.

To answer your question, yes it's possible but the approach is like this:

dic = {x: (dic[x] + 1 if x in dic else 1) for x in str1}

The condition is applied on the value only not on the key:value mapping.

The above can be made clearer using dict.get:

dic = {x: dic.get(x, 0) + 1 for x in str1}

0 is returned if x is not in dic.

Demo:

In [78]: s = "abcde"

In [79]: dic = {}

In [80]: dic = {x: (dic[x] + 1 if x in dic else 1) for x in s}

In [81]: dic 
Out[81]: {'a': 1, 'b': 1, 'c': 1, 'd': 1, 'e': 1}

In [82]: s = "abfg"

In [83]: dic = {x: dic.get(x, 0) + 1 for x in s}

In [84]: dic
Out[84]: {'a': 2, 'b': 2, 'f': 1, 'g': 1}

edited Dec 03 '16 at 19:06

answered Dec 03 '16 at 18:23

sirfz

4,097
23
37

You mean just `dic = {x: dic.get(x, 1) for x in str}` ? Did you defined `dic` anywhere outside? Because within dict comprehension it will have no reference – Moinuddin Quadri Dec 03 '16 at 18:28
this will fail with a `NameError: name 'dic' is not defined` (once you fix the str to str1) – Copperfield Dec 03 '16 at 18:28
@MoinuddinQuadri yep just noticed it, would've been more helpful if you pointed it out. Fixed now – sirfz Dec 03 '16 at 18:31
now you need to reset dic if you want to start over – Copperfield Dec 03 '16 at 18:43
@Copperfield this is not the right way to do it but it answers the OP's question about if-else in comprehensions. – sirfz Dec 03 '16 at 18:44
@Copperfield that's already noted in my answer. Again, this is **not** the correct way to achieve the desired result but the original question is about using if-else inside dict-comprehensions which is what I demonstrate in my answer. It's just for the sake of knowledge. – sirfz Dec 03 '16 at 19:03
my bad, I did not notice your note – Copperfield Dec 03 '16 at 19:05

Find count of characters within the string in Python

4 Answers4

Linked

Related