I'm trying to make a python script that removes unwanted words and punctuation from a txt file. However, it is not good enough for the grader?

Question

I'm trying to make a word cloud. I need to strip a txt file of uninteresting words and punctuations. The grader just isn't giving me any feedback. I think my script removes some extra words and I can't figure out why. Can someone point me in the right direction?

punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]

def count(file_contents):
    frequencies = {}
    word_list = file_contents.split()
    final_list = []
    #remove all uninteresting words
    for word in word_list:
    
        new_word = ""
        for character in word:
            if character not in punctuations and character.isalpha():
                new_word += character
            
        if word.lower() not in uninteresting_words:
            final_list.append(new_word)
        
    for word in final_list:
        if word not in frequencies:
            frequencies[word] = 0 
        frequencies[word] += 1
    return frequencies

Your punctuation removal is never going to work, because you copy your edited word into `new_word` (at `new_word = word.replace(character, "")`), and then later replace `new_word` with the original word again (at `new_word = word`) as a result you're removing all words with punctuation (as the punctuation is still there when you test them with `.isalpha()`). — Nick is tired, Jun 25 '22 at 01:11
Adding to what what @NickstandswithUkraine said, you never updated the file. — Codeman, Jun 25 '22 at 01:18
Check out: [Creating a list of every word from a text file without spaces, punctuation](https://stackoverflow.com/questions/18135967/creating-a-list-of-every-word-from-a-text-file-without-spaces-punctuation) — DarrylG, Jun 25 '22 at 01:23
Why is it that when I remove the line `new_word = word` it gives me an `UnboundLocalError: local variable 'new_word' referenced before assignment`, Am I not assigning it a value at `new_word = word.replace(character, "")`. The if statement `if new_word not in...` is called after the for loop right. So `new_word` should be assigned. Am I missing something? — Haroon Atif, Jun 25 '22 at 01:36
@HaroonAtif You do that (the `new_word = ...`) in an `if` statement (`if character in punctuations`), what do you think would happen if `word` _doesn't_ contain any punctuation? — Nick is tired, Jun 25 '22 at 01:38
Ok I just removed `new_word` altogether. I think this is better but it's still not correct. — Haroon Atif, Jun 25 '22 at 01:54
`.replace` and `.strip` don't happen "in place", so you'd still need `word = word.replace(...)` and `word = word.strip(...)` — Nick is tired, Jun 25 '22 at 01:56
I changed some stuff and turned it into a function but it still isn't correct. I found a different code online from [link](https://jovian.ai/kzaman3055/final-project-word-cloud) but that isn't correct either. — Haroon Atif, Jun 25 '22 at 02:52

score 0 · Answer 1 · answered Jun 25 '22 at 03:28

I don't know if I wasn't getting the answer wrong because I was uploading it wrong or if coursera was bugged because I submitted this code twice in two different ways. One I directly clicked submit assignment through; The other, I downloaded the notebook and submitted through coursera. This worked and it gave me the correct answer. Regardless, this is correct code.

# Here is a list of punctuations and uninteresting words you can use to process your text
punctuations = '''!()-[]{};:'"\,<>./?@#$%^&*_~'''
uninteresting_words = ["the", "a", "to", "if", "is", "it", "of", "and","or", "an","in", "as", "i", "me", "my", \
"we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her", "hers", "its", "they", "them", \
"their", "what", "which", "who", "whom", "this", "that", "am", "are", "was", "were", "be", "been", "being", \
"have", "has", "had", "do", "does", "did", "but", "at", "by", "with", "from", "here", "when", "where", "how", \
"all", "any", "both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can", "will", "just"]

# LEARNER CODE START HERE

file_no_punct = ""

#remove all punctuation
for char in file_contents:
    if char.isalpha() == True or char.isspace():
        file_no_punct += char
            

boring_list = file_no_punct.split()
zesty_list =[]
#remove all uninteresting words
for word in boring_list:
    if word.lower() not in uninteresting_words and word.isalpha()==True:
        zesty_list.append(word)
        
frequencies = {}
for word in zesty_list:
    if word not in frequencies:
        frequencies[word] = 0 
    frequencies[word] += 1

I'm trying to make a python script that removes unwanted words and punctuation from a txt file. However, it is not good enough for the grader?

1 Answers1