0

I am making a program that makes a word cloud. I want a list of word without punctuations and commonly used words. I removed the punctuation, using the function removepunc; it works fine. Now I am creating a second function to remove commonly used words (I am not using previous logic since it removes the letter I from the program along with the pronoun I), I am getting the error IndexError: list index out of range, I converted the file into a list.

CODE:

def removepunc(z):
test_str=z
punc = '''!()-[]{};:'""\,<>./?@#$%^&*_~'''
for ele in test_str:
    if ele in punc:
        test_str = test_str.replace(ele, "")
return test_str

def removebad(f):
    print(type(f))
    z=[]
    badword2 = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", 
    "my","we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", 
    "hers", "its", "they","them","their", "what", "which", "who", "whom", "this", "that", 
    "am", "are", "was", "were", "be", "been","being","have", "has", "had", "do", "does", 
    "did", "but", "at", "by", "with", "from", "here", "when", "where","how","all", "any", 
    "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", 
    "will","just"]
    for i in range (len(f)-1):
        if f[i] in badword2:
            x=f.pop(i)
            z.append(x)
        else:
            continue
    return f



file=open("openfile.txt")
a=file.read()
a=a.lower()
unqword=removepunc(a)
ab=unqword.split()
print(type(ab))
unqword1=removebad(ab)
print(unqword1)

`

OUTPUT:

C:\Users\Nitin\PycharmProjects\pythonProject1\venv\Scripts\python.exe C:/Users/Nitin/PycharmProjects/pythonProject1/prjt.py
<class 'list'>
<class 'list'>
Traceback (most recent call last):
  File "C:/Users/Nitin/PycharmProjects/pythonProject1/prjt.py", line 29, in <module>
    unqword1=removebad(ab)
  File "C:/Users/Nitin/PycharmProjects/pythonProject1/prjt.py", line 14, in removebad
    if f[i] in badword2:
IndexError: list index out of range

Process finished with exit code 1

i have not written logic for wordcloud which i will do later when i get rid of this

  • 1
    You can't modify a list while you're iterating through it. The `range` object is created once when the `for` loop starts, and will run through the length of the list at that time. When you pop items, you make the list shorter, but the range doesn't know that. The best plan in a case like this is just to create two brand new lists: one with the ones to keep, one with the ones to toss. Then you don't even have to use `range(len())`, you can do `for item in f:`. – Tim Roberts Dec 31 '21 at 06:57

2 Answers2

0

Create a new list with the words to keep:

def removebad(f):
    print(type(f))
    badword2 = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", 
    "my","we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", 
    "hers", "its", "they","them","their", "what", "which", "who", "whom", "this", "that", 
    "am", "are", "was", "were", "be", "been","being","have", "has", "had", "do", "does", 
    "did", "but", "at", "by", "with", "from", "here", "when", "where","how","all", "any", 
    "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", 
    "will","just"]
    keep = []
    toss = []
    for w in f:
        if w in badword2:
            toss.append( w )
        else:
            keep.append( w )
    return keep

Of course, I don't understand why you're keeping the "toss" words (your z list), since you don't do anything with it. Without that, it could just be:

    return [w for w in f if w not in badword2]
Tim Roberts
  • 48,973
  • 4
  • 21
  • 30
  • thanks, and yeh i do not need the toss elements, but i was trying to minimise flaws in my program, thanks again for teaching that i cant iterate and modify the same list, no one teaches these in class, we gain these things by exp – Nitin Kumar Dec 31 '21 at 07:08
0
from typing import List


def removepunc(z):
    test_str = z
    punc = r'''!()-[]{};:'""\,<>./?@#$%^&*_~'''
    for ele in test_str:
        if ele in punc:
            test_str = test_str.replace(ele, "")
    return test_str


def removebad(f: List):
    print(type(f))
    z = []
    badword2 = {"the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me",
                "my", "we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her",
                "hers", "its", "they", "them", "their", "what", "which", "who", "whom", "this", "that",
                "am", "are", "was", "were", "be", "been", "being", "have", "has", "had", "do", "does",
                "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", "all", "any",
                "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can",
                "will", "just"}
    returned_f = []
    for item in f:
        if item in badword2:
            z.append(item)
        else:
            returned_f.append(item)
    return returned_f


file = open("openfile.txt")
a = file.read()
a = a.lower()
unqword = removepunc(a)
ab = unqword.split()
print(type(ab))
unqword1 = removebad(ab)
print(unqword1)

2474101468
  • 328
  • 2
  • 9