2

I was trying to use a dictionary to count word frequency on a given string. Say:

s = 'I ate an apple a big apple'

I understand the best way to count word frequency is probably to use collections.Counter. But I want to know if I can solve this by using a dictionary comprehension.

My original method(without dictionary comprehension) was

dict = {}
for token in s.split(" "):
    dict[token] = dict.get(token, 0) + 1

and it works fine:

dict
{'I': 1, 'a': 1, 'an': 1, 'apple': 2, 'ate': 1, 'big': 1}

I tried to use a dictionary comprehension to this, like

dict = {}
dict = {token: dict.get(token, 0) + 1 for token in s.split(" ")}

But this didn't work.

dict
{'I': 1, 'a': 1, 'an': 1, 'apple': 1, 'ate': 1, 'big': 1}

What's wrong with the dictionary comprehension? Is it because I used itself inside the comprehension so every time I called dict.get('apple', 0) in the comprehension, I will get 0? However, I don't know how to test this so I am not 100% sure.

P.S. If it makes any difference, I am using python 3.

lanrete
  • 127
  • 1
  • 12
  • 2
    This is what `collections.Counter` (a dict subtype) solved long ago – Moses Koledoye Nov 15 '16 at 14:07
  • 1
    i wouldn't use dict as a variable name since it's a built in, you could break something by doing so – e4c5 Nov 15 '16 at 14:07
  • 1
    The variable `dict` isn't being updated until the comprehension is fully calculated, so `dict.get(token, 0)` inside the comprehension is only ever consulting the empty dictionary from the previous line. – khelwood Nov 15 '16 at 14:08
  • @MosesKoledoye Agreed, just want to dig into the mechanic behind dict comprehension. – lanrete Nov 17 '16 at 02:48

3 Answers3

2

If you go through your code operation by operation, you will see what is wrong.

First you set dict to an empty dict. (As mentioned in the comments, it's a bad idea to use that for your own variable name, but that's not the problem here.)

Secondly, your dict comprehension is evaluated. At this point the name dict still refers to the empty dict. So every time you do dict.get(whatever, 0), it will always get the default.

Finally, your populated dict is reassigned to the name dict, replacing the empty one that was previously there.

Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
1

You could also use list.count(), as:

s = 'I ate an apple a big apple'

print  {token: s.split().count(token) for token in set(s.split())}
Ivan Chaer
  • 6,980
  • 1
  • 38
  • 48
1

For your dictionary comprehension to work, you need a reference to the comprehension inside itself. Something like this would work

{token: __me__.get(token, 0) + 1 for token in s.split(" ")}

if there were such thing as '__me__' referencing the comprehension being built. In Python 3 there is no a documented way to do this.

According to this answer, an undocumented "implementation artifact" (on which Python users should not rely) can be used in Python 2.5, 2.6 to write self-referencing list comprehension. Maybe a similar hack exists for dictionary comprehensions in Python 3 too.

Community
  • 1
  • 1
SergiyKolesnikov
  • 7,369
  • 2
  • 26
  • 47