2

I'm trying to create a program which reads in a text file and find the number of individual words. I have worked out most of it but I am stuck on trying to get the counter to pick out words not letters as it is currently doing.

import collections 

with open ("file.txt" ,"r") as myfile:
    data=myfile.read()
[i.split(" ") for i in data]

x=collections.Counter(data)

print (x)

My aim was to slip the list by whitespace which would result in each word being a object in the list. This however did not work.

Result:

Counter({' ': 1062, 'e': 678, 't': 544, 'o': 448, 'n': 435, 'a': 405, 'i': 401, 'r': 398,       's': 329, 'c': 268, 'm': 230, 'h': 216, 'u': 212, 'd': 190, 'l': 161, 'p': 148, 'f': 107, 'g': 75, 'y': 68, '\n': 65, ',': 61, 'b': 55, 'w': 55, 'v': 55, '.': 53, 'N': 32, 'A': 20, 'T': 19, '"': 18, ')': 17, '(': 17, 'C': 17, 'k': 16, "'": 16, 'I': 16, 'x': 15, '-': 14, 'E': 13, 'q': 12, 'V': 10, 'U': 9, ';': 7, '1': 6, 'j': 5, '4': 5, 'P': 5, 'D': 5, '9': 5, 'L': 4, 'z': 4, 'W': 4, 'O': 3, 'F': 3, '5': 3, 'J': 2, '3': 2, 'S': 2, 'R': 2, '0': 1, ':': 1, 'H': 1, '2': 1, '/': 1, 'B': 1, 'M': 1, '7': 1})
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
user3447228
  • 478
  • 1
  • 4
  • 12
  • Does this answer your question? [How to find the count of a word in a string?](https://stackoverflow.com/questions/11300383/how-to-find-the-count-of-a-word-in-a-string) – Georgy Oct 07 '20 at 10:49

2 Answers2

2

Your list comprehension is never assigned and thus doesn't do anything.

Pass the split text to collections.Counter():

x = collections.Counter(data.split())

and I used str.split() without arguments to make sure you split on arbitrary width whitespace and include newlines when splitting as well; your Counter() has 65 newlines that need not be there, for example.

In context and a little more compact:

from collections import Counter

with open ("file.txt") as myfile:
    x = Counter(myfile.read().split())

print(x)
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • 1
    be sure to use `split()` instead of `split(" ")`, like Martijn Pieters did, otherwise the string will not be split at newlines or tabs! – MaSp Mar 21 '14 at 16:41
  • "Your list comprehension is never assigned and thus doesn't do anything." that's not the case in python 2.7. – acushner Mar 21 '14 at 17:04
  • 1
    @acushner: The list comprehension expression can execute side effects (not a good idea), but *in this case* the only thing that comprehension produces is a new list object. And it is then discarded because it is no longer used. I know you are talking about `i` leaking to the namespace, but that's hardly helping the OP, is it. – Martijn Pieters Mar 21 '14 at 17:05
  • @acushner: besides, this question is tagged with the `python-3.x` tag, so the Python 2.x list comprehension not having its own scope does not apply here. – Martijn Pieters Mar 21 '14 at 17:06
  • @MartijnPieters, yeah, i read "doesn't do anything" as literally is not executed and i was surprised by that (so i checked it with dis). but i realized afterward that what you meant was that "it doesn't alter the data in a way you would expect." thanks – acushner Mar 21 '14 at 17:07
  • fair enough, but it doesn't matter. it's the same in python 3. – acushner Mar 21 '14 at 17:10
0

To answer the title, instead of updating the counter with a string, set a list of one or more strings.

Then, if your code is:

from collections import Counter
words_count = Counter("tiger")

Remember that string is a list chars. The code is like:

from collections import Counter
words_count = Counter("t", "i", "g", "e", "r")

Otherwise, if your code is:

from collections import Counter
words_count = Counter(["tiger"])

Then, the list element is the complete word.

Eduardo Freitas
  • 941
  • 8
  • 6