-2

why does default dict count for the number of empty spaces in my list?

I calculate the number of times a character appears in a word using default dict. But My code also counts the number of empty spaces between the words aswell. So how do I calculate only the occurence of words and omit the empty spaces that occur in my words.

from collections import defaultdict

def count_var(word):
    d = defaultdict(int)
    for val in word:
        d[val]+=1
    return d

ct = count_var('big data examiner')


print ct

defaultdict(<type 'int'>, {'a': 3, ' ': 2, 'b': 1, 'e': 2, 'd': 1, 'g': 1, 'i': 2, 'm': 1, 'n': 1, 'r': 1, 't': 1, 'x': 1})
dangerous
  • 171
  • 1
  • 1
  • 11
  • 1
    Why wouldn't it? And why don't you just use `collections.Counter`?! – jonrsharpe Feb 16 '15 at 13:56
  • @jonrsharpe can you replicate the same code using counter? – dangerous Feb 16 '15 at 13:59
  • 2
    Why not [read the docs](https://docs.python.org/2/library/collections.html#collections.Counter), try it, and see? But bear in mind that people rarely ask questions like that when the answer is *"because you can't use [whatever] to do this"*... – jonrsharpe Feb 16 '15 at 13:59
  • @jonrsharpe but even counter counts the empty spaces. – dangerous Feb 16 '15 at 14:02
  • @dangerous oh for pity's... `Counter` **just simplifies your code**. If you want it to count *words*, rather than *characters* (including spaces), you have to pass it **words rather than characters**. – jonrsharpe Feb 16 '15 at 14:03
  • @jonrsharpe - I hope this isnt a stupid question. I couldnt find the answer by surfing google. thats why I posted it here. ok I will keep this in mind while I post next time. – dangerous Feb 16 '15 at 14:04
  • You can find [whether your character, or string, is whitespace](http://stackoverflow.com/questions/2405292/how-to-check-if-text-is-empty-spaces-tabs-newlines-in-python) by saying `val.isspace()`. Then you can choose whether to add it to your count. – Peter Wood Feb 16 '15 at 14:06

2 Answers2

3

Change this line

ct = count_var('big data examiner')

To

ct = count_var('big data examiner'.split())

This will count words instead of characters. And to answer why it was counting spaces, because spaces are a valid character, just like any letter or digit would be, so it gets counted.

Also note that there exists collections.Counter that is better suited to tackle this problem for you, especially since you are already importing from collections.

Edit

Regarding how to use collections.Counter the same idea applies from above.

This counts characters

>>> Counter('big data examiner')
Counter({'a': 3, 'i': 2, 'e': 2, ' ': 2, 't': 1, 'b': 1, 'n': 1, 'd': 1, 'm': 1, 'g': 1, 'x': 1, 'r': 1})

This counts words

>>> Counter('big data examiner'.split())
Counter({'big': 1, 'data': 1, 'examiner': 1})

Edit #2 Counting all non-space characters

You can use str.replace(' ', '')

>>> from collections import Counter
>>> Counter('big data examiner'.replace(' ', ''))
Counter({'a': 3, 'i': 2, 'e': 2, 'x': 1, 'b': 1, 'r': 1, 'g': 1, 'n': 1, 't': 1, 'm': 1, 'd': 1})
Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
  • what does split do here? – dangerous Feb 16 '15 at 13:58
  • 3
    @dangerous what it always does: [read the docs](https://docs.python.org/2/library/stdtypes.html#str.split)! – jonrsharpe Feb 16 '15 at 13:58
  • 1
    @dangerous [`str.split`](https://docs.python.org/2/library/stdtypes.html#str.split) will parse a string into a list of tokens based on the provided delimiter. If no delimiter is provided, it will split on white space. – Cory Kramer Feb 16 '15 at 13:59
  • @Cyber I want my output like this , I dont want words, I just need characters excluding that empty spaces. Counter({'a': 3, 'i': 2, 'e': 2, 't': 1, 'b': 1, 'n': 1, 'd': 1, 'm': 1, 'g': 1, 'x': 1, 'r': 1}) – dangerous Feb 16 '15 at 14:11
  • @dangerous why don't you just ignore the spaces in whatever you use it for next? – jonrsharpe Feb 16 '15 at 14:14
  • @jonrsharpe - ok then I will ignore the spaces then, was curious on how to do that in python. – dangerous Feb 16 '15 at 14:17
  • @dangerous Edited again. – Cory Kramer Feb 16 '15 at 14:19
1

To answer the specific question:

why does default dict count for the number of empty spaces in my list?

Because the spaces are still characters. For example:

>>> list('big data examiner')
['b', 'i', 'g', ' ', 'd', 'a', 't', 'a', ' ', 'e', 'x', 'a', 'm', 'i', 'n', 'e', 'r']
               # ^                        ^

As currently written, your code counts every character, including spaces. If you want to exclude spaces from the count, you need to make that explicit:

def count_var(word):
    d = defaultdict(int)
    for val in word:
        if val != ' ':  # exclude spaces
            d[val]+=1
    return d

Alternatively, rather than excluding ' ' from the counting process, simply don't use that key in whatever you do with d next.


Note that collections also provides Counter, which can significantly simplify your code:

>>> from collections import Counter
>>> Counter(char for char in 'big data examiner' if char != ' ')
Counter({'a': 3, 'e': 2, 'i': 2, 'b': 1, 'd': 1, 'g': 1, 'm': 1, 'n': 1, 'r': 1, 't': 1, 'x': 1})
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437