1

I wonder, how to read character string like fscanf. I need to read for word, in the all .txt . I need a count for each words.

collectwords = collections.defaultdict(int)

with open('DatoSO.txt', 'r') as filetxt:

for line in filetxt:
    v=""
    for char in line:
        if str(char) != " ":
          v=v+str(char)

        elif str(char) == " ":
          collectwords[v] += 1
          v=""

this way, I cant to read the last word.

a3rxander
  • 868
  • 2
  • 10
  • 17

3 Answers3

3

Uhm, like this?

with open('DatoSO.txt', 'r') as filetxt:
    for line in filetxt:
        for word in line.split():
            collectwords[word] += 1
poke
  • 369,085
  • 72
  • 557
  • 602
3

You might also consider using collections.counter if you are using Python >=2.7

http://docs.python.org/library/collections.html#collections.Counter

It adds a number of methods like 'most_common', which might be useful in this type of application.

From Doug Hellmann's PyMOTW:

import collections

c = collections.Counter()
with open('/usr/share/dict/words', 'rt') as f:
    for line in f:
        c.update(line.rstrip().lower())

print 'Most common:'
for letter, count in c.most_common(3):
    print '%s: %7d' % (letter, count)

http://www.doughellmann.com/PyMOTW/collections/counter.html -- although this does letter counts instead of word counts. In the c.update line, you would want to replace line.rstrip().lower with line.split() and perhaps some code to get rid of punctuation.

Edit: To remove punctuation here is probably the fastest solution:

import collections
import string

c = collections.Counter()
with open('DataSO.txt', 'rt') as f:
    for line in f:
        c.update(line.translate(string.maketrans("",""), string.punctuation).split())

(borrowed from the following question Best way to strip punctuation from a string in Python)

Community
  • 1
  • 1
JoshAdel
  • 66,734
  • 27
  • 141
  • 140
1

Python makes this easy:

collectwords = []
filetxt = open('DatoSO.txt', 'r')

for line in filetxt:
  collectwords.extend(line.split())
Artfunkel
  • 1,832
  • 17
  • 23