-1

I am currently writing a program that will take in a text file, and then count the frequency of each word in the file, after lower casing every word and stripping its punctuation.

Here is my code:

import sys 
import string

incoming =[]
freq =[]
word =[]
count = 0
index = 0
i = 0

with open(sys.argv[1], "r") as word_list:
    for line in word_list:
        #word is the string of the .txt file

        #strips punctuation and lower cases each word
        for words in line.split():
            words = words.translate(string.maketrans("",""), string.punctuation)
            words = words.lower()
            incoming.append(words)
        #incoming is now an array with each element as a word from the file     

    for i in range(len(incoming)-1):
        if (incoming[i]) not in word:
            #WORD[i] = word[index]
            word[index] = incoming[i]
            freq[index] = 1
            index += 1

        else: 
            freq[index] = freq[index] + 1


    for j in word:
        print "%s %d", word[j], freq[j]

I am getting the error:

  File "wordfreq.py", line 26, in <module>
    word[index] = incoming[i]
IndexError: list assignment index out of range

But I fail to see how it can be out of range. Neither index nor i go out of range as far as I can tell. I am new to Python and am having a lot of trouble with the 'for' loop syntax. Any tips would be much appreciated.

Kommander Kitten
  • 283
  • 2
  • 7
  • 16
  • 2
    In python, you can iterate through a list simply by doing `for item in list:`. You don't need to use `range(len(list)-1)`. If you still need access to the index, use `for i, item in enumerate(list):`. – Ben Longo Nov 10 '15 at 01:40
  • How does that translate to looping through the index of the array though? Or how can I "number" my items in the list? I'm having trouble wrapping my head around that. – Kommander Kitten Nov 10 '15 at 01:41
  • 2
    I would really recommend against using both `WORD` and `word` as variable names in the same source code. – TigerhawkT3 Nov 10 '15 at 01:41
  • Noted. Since they are both arrays of words I figured I could get away with it. But it's understandably not readable for others. Will fix it! – Kommander Kitten Nov 10 '15 at 01:42
  • As a convention, python variables are only all-upper case when it's a constant (or a matrix), so `ARR` is not quite appropriate here. – simonzack Nov 10 '15 at 01:44
  • Possible duplicate of [For-each over an array in JavaScript?](http://stackoverflow.com/questions/9329446/for-each-over-an-array-in-javascript) – Alberto Bonsanto Nov 10 '15 at 02:00

2 Answers2

1

In your code, word[index] indeed does not exist. What you should do instead is word.append(WORD[i]).

Phonon
  • 12,549
  • 13
  • 64
  • 114
  • I'm getting a different error now. `File "wordfreq.py", line 23, in if (WORD[i]) not in word: TypeError: list indices must be integers, not str` Not sure how to fix this though. I'm new to types in Python. I thought i was considered an int already? – Kommander Kitten Nov 10 '15 at 01:44
  • That line differs from what you posted. I think that's a separate issue. – Phonon Nov 10 '15 at 02:07
1

A better approach might be to use a defaultdict:

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in ["abc", "abc", "def"]:
...     d[i] += 1
...
>>> d
defaultdict(<type 'int'>, {'abc': 2, 'def': 1})
>>>

This is a more pythonic way to count frequencies, rather than maintaining indexes. The words are in d.keys() and their frequencies are in d.values()

fiacre
  • 1,150
  • 2
  • 9
  • 26
  • 1
    Or even [```collections.Counter```](https://docs.python.org/3/library/collections.html#collections.Counter) – wwii Nov 10 '15 at 01:56