0

I am trying to count words in a file 'xxxx' by building a dictionary wherein the keys are the words and the values are the number of occurences. So far I got this:

fil = open("xxxx","r")
X = fil.read()

count = {}
for key in X.split():
  count[key] += 1

for i in count:
  print (i, count[i])

When I run this, I get:

Traceback (most recent call last):
  File "countword.py", line 9, in <module>
    count[key] = count[key] + 1
KeyError: 'From'

'From' is the first word in the file and since there is no key 'From' up until now, I believe is the cause of the error. But what is the right way to do this? ALso do I need to initialise the value somehow before getting into the for loop?

user2311285
  • 437
  • 6
  • 14
  • Use `collections.defaultdict` – Moses Koledoye Apr 22 '17 at 18:04
  • The first time you encounter a word, the key does not exist so `count[key]` fails, look at a defaultdict. – roganjosh Apr 22 '17 at 18:04
  • Your basic problem is that you are trying to add to a value that isn't there. This is something `collections.defaultdict` or `dict.get()` can both solve, but the far better solution is to use `collections.Counter()` to do the counting for you. – Martijn Pieters Apr 22 '17 at 18:06
  • @MartijnPieters _everyone_ here sings praises for `Counter` but I have always found `.get(value, 0) + 1` faster as a counter and it was also recommended by Raymond Hettinger in one of his keynotes. Is there an objective reason that it's better, other than slightly less ambiguous code, that I'm missing? – roganjosh Apr 22 '17 at 18:09
  • @roganjosh: That's no longer true, `Counter` in Python 3 uses loop in C to speed up the counting. – Martijn Pieters Apr 22 '17 at 18:13
  • @roganjosh: it is the *rest of the functionality* that makes `Counter` great however. – Martijn Pieters Apr 22 '17 at 18:14
  • @MartijnPieters then that makes perfect sense if it changed in P3, I'm stuck on P2 for work. For anything other than the basic case, `Counter` made sense, but I couldn't understand why it was always pushed when it came short in my benchmarks. Thanks. – roganjosh Apr 22 '17 at 18:15
  • 1
    @roganjosh Porting to Python 3 is getting easier and easier. I'd push for conversion at work. Python 2 is legacy and going away soon (3 years until End-of-life!) – Martijn Pieters Apr 22 '17 at 18:17

1 Answers1

1

Use a Counter:

from collections import Counter

X = "From A to B"

count = Counter()
for key in X.split():
    count[key] += 1

count
# Counter({'A': 1, 'B': 1, 'From': 1, 'to': 1})
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • This works, thanks but I would be interested in a solution without in built specialised methods. How would I do this with just the ordinary pythonic dictionary. – user2311285 Apr 23 '17 at 05:11
  • Ok, I just read up on it and I realise that because of the new keys this can be done through either counter or default dict. – user2311285 Apr 23 '17 at 05:48