15

I have a list of lists in python full of texts. It is like set words from each document. So for every document i have a list and then on list for all documents.

All the list contains only unique words. My purpose is to count occurrence of each word in the complete document. I am able to do this successfully using the below code:

for x in texts_list:
    for l in x:
        if l in term_appearance:
            term_appearance[l] += 1
        else:
            term_appearance[l] = 1

But I want to use dictionary comprehension to do the same. This is the first time, I am trying to write dictionary comprehension and using previous existing posts in stackoverflow, I have been able to write the following:

from collections import defaultdict
term_appearance = defaultdict(int)

{{term_appearance[l] : term_appearance[l] + 1 if l else term_appearance[l] : 1 for l in x} for x in texts_list}

Previous post for reference:

Simple syntax error in Python if else dict comprehension

As suggested in above post, I have also used the following code:

{{l : term_appearance[l] + 1 if l else 1 for l in x} for x in texts_list}

The above code was successful in producing empty lists but ultimately threw the following traceback :

[]

[]

[]

[]

Traceback (most recent call last):

  File "term_count_fltr.py", line 28, in <module>

    {{l : term_appearance[l] + 1 if l else 1 for l in x} for x in texts_list}
  File "term_count_fltr.py", line 28, in <setcomp>

    {{l : term_appearance[l] + 1 if l else 1 for l in x} for x in texts_list}

TypeError: unhashable type: 'dict'

Any help in improving my current understanding would be much appreciated.

Looking at the above error, I also tried

[{l : term_appearance[l] + 1 if l else 1 for l in x} for x in texts_list]

This ran without any error but the output was empty lists only.

Community
  • 1
  • 1
Pappu Jha
  • 477
  • 1
  • 3
  • 14
  • Good luck... Here is a thought, Default dict will default to zero, Which means you might not need the if-else part. – nehem Oct 08 '15 at 03:18

4 Answers4

12

Like explained in the other answers, the issue is that dictionary comprehension creates a new dictionary, so you don't get reference to that new dictionary until after it has been created. You cannot do dictionary comprehension for what you are doing.

Given that, what you are doing is trying to re-implement what is already done by collections.Counter . You could simply use Counter . Example -

from collections import Counter
term_appearance = Counter()
for x in texts_list:
    term_appearance.update(x)

Demo -

>>> l = [[1,2,3],[2,3,1],[5,4,2],[1,1,3]]
>>> from collections import Counter
>>> term_appearance = Counter()
>>> for x in l:
...     term_appearance.update(x)
...
>>> term_appearance
Counter({1: 4, 2: 3, 3: 3, 4: 1, 5: 1})

If you really want to do this in some kind of comprehension, you can do:

from collections import Counter
term_appearance = Counter()
[term_appearance.update(x) for x in texts_list]

Demo -

>>> l = [[1,2,3],[2,3,1],[5,4,2],[1,1,3]]
>>> from collections import Counter
>>> term_appearance = Counter()
>>> [term_appearance.update(x) for x in l]
[None, None, None, None]
>>> term_appearance
Counter({1: 4, 2: 3, 3: 3, 4: 1, 5: 1})

The output [None, None, None, None] is from the list comprehension resulting in that list (because this was run interactively), if you run this in a script as python <script>, that output would simply be discarded.


You can also use itertools.chain.from_iterable() to create a flattened list from your text_lists and then use that for Counter. Example:

from collections import Counter
from itertools import chain
term_appearance = Counter(chain.from_iterable(texts_list))

Demo -

>>> from collections import Counter
>>> from itertools import chain
>>> term_appearance = Counter(chain.from_iterable(l))
>>> term_appearance
Counter({1: 4, 2: 3, 3: 3, 4: 1, 5: 1})

Also, another issue in your original code in line -

{{term_appearance[l] : term_appearance[l] + 1 if l else term_appearance[l] : 1 for l in x} for x in texts_list}

This is actually a set comprehension with a dictionary comprehension nested inside.

This is the reason you are getting the error - TypeError: unhashable type: 'dict' . Because after first running the dictionary comprehension and creating a dict , it is trying to add that into the set . But dictionaries are not hashable, hence the error.

Anand S Kumar
  • 88,551
  • 18
  • 188
  • 176
6

The reason you are getting the unhashable type error is that you cannot use a dictionary as the key for another dictionary in Python, because they are mutable containers.

See: why dict objects are unhashable in python?

Community
  • 1
  • 1
Jacob Ritchie
  • 1,261
  • 11
  • 22
3

Dictionary comprehensions in Python 2.7+ don't work the way you may think they work.

Like list comprehensions, they create a new dictionary but you can't use them to add keys to an already existing dictionary (which in this case is what you are trying to do).

shafeen
  • 2,431
  • 18
  • 23
3

Please do look through answer by Anand S Kumar if you want to use collections.Counter which is a great suggestion. However there is another solution related to using collections.defaultdict which I find worth mentioning:

from collections import defaultdict

text_appearances = defaultdict()

for x in texts_lists:
    for l in x:
        text_appearances[l] += 1

I've used this construct some times, and I think is a clean and nice way of doing the count. Especially if you for some reason needs to do some verification in between in the loop, this is an effective way of updating the count directly without worrying whether the key/word already exists in your dictionary (like in your first solution).

Sidenote on variable naming: Please don't use lowercase l (lowercase of L) as a variable name, it is hard to distinguish from 1 (the number one). In your case maybe you could name the variables, words and word? With the addition of not using _list as a postfix, the code could read:

for words in texts:
    for word in words:
        text_appearance[word] += 1
Community
  • 1
  • 1
holroy
  • 3,047
  • 25
  • 41