stripping punctuation and finding unique words in Python

Question

So my task is as such:

Write a program that displays a list of all the unique words found in the file uniq_words.txt. Print your results in alphabetic order and lowercase. Hint: Store words as the elements of a set; remove punctuations by using the string.punctuation from the string module.

Currently, the code that I have is:

def main():
    import string

    with open('uniq_words.txt') as content:
        new = sorted(set(content.read().split()))
        for i in new:
            while i in string.punctuation:
                new.discard(i)
                print(new)

main()

If I run the code as such, it goes into an infinite loop printing the unique words over and over again. There sre still words in my set that appear as i.e "value." or "never/have". How do I remove the punctuation with the string.punctuation module? Or am I approaching this from a wrong direction? Would appreciate any advice!

Edit: The link does not help me, in that the method given does not work in a list.

Does this answer your question? [Best way to strip punctuation from a string](https://stackoverflow.com/questions/265960/best-way-to-strip-punctuation-from-a-string) — GiftZwergrapper, Oct 31 '19 at 13:20
@GiftZwergrapper I've actually read through that post previously, but I don't think it solves my question here. — aislinx, Oct 31 '19 at 13:41

Tinu · Answer 1 · 2019-10-31T14:08:41.123

2

My solution:

import string
with open('sample_string.txt') as content:
    sample_string = content.read()
    print(sample_string)
    # Sample string: containing punctuation! As well as CAPITAL LETTERS and duplicates duplicates.
    sample_string = sample_string.strip('\n')
    sample_string = sample_string.translate(str.maketrans('', '', string.punctuation)).lower()
    out = sorted(list(set(sample_string.split(" "))))
    print(out)
    # ['and', 'as', 'capital', 'containing', 'duplicates', 'letters', 'punctuation', 'sample', 'string', 'well']

edited Oct 31 '19 at 14:08

answered Oct 31 '19 at 13:19

Tinu

2,432
2
8
20

Hi, thanks for your comment. Currently I'm getting this error: AttributeError: 'list' object has no attribute 'translate' when I try to run the code. – aislinx Oct 31 '19 at 13:47
That's probably due to the way you read your file. Try this above the line creating the error: `sample_text = ' '.join(sample_text)`, that should convert your list into a string. – Tinu Oct 31 '19 at 13:50
` def main(): import string with open('uniq_words.txt') as content: new1 = content.read().split() new2= ' '.join(new1) nopunc = new2.translate(str.maketrans('', '', string.punctuation)).lower() out = sorted(list(set(nopunc,split(" ")))) main() ` is this what you mean? – aislinx Oct 31 '19 at 13:55

Jonathan Scholbach · Answer 2 · 2019-10-31T13:22:06.320

This is actually two tasks, so let's split this into two questions. I will deal with your problem regarding stripping punctuation, because you have shown own efforts in this matter. For the problem of determining unique words, please open a new question (and also look for similar questions here on stack overflow before posting a new question, I am pretty sure you will find something useful!)

You correctly found out that you are ending up in an infinite loop. This is because your while loop condition is always true, once i is a punctuation character. Removing i from new does not change that. You avoid this by using a simple if-condition. Actually, your code is mixing up the concept of while and of if and your scenario is tailored for an if-statement. I think you thought you needed a while loop, because you had the concept of iteration in mind. But you are already iterating over content in the for loop. So, the bug fix would be:

for i in new:
    if i in string.punctuation:
        new.discard(i)

However, a different and more "pythonic" way would be to use list comprehension instead of a for-loop

with open("uniq_words.txt") as content:
    stripped_content = "".join([
        x 
        for x in content.read() 
        if x not in string.punctuation
    ])

Your more "pythonic" way doesn't work. This code has invalid syntax.. — GiftZwergrapper, Oct 31 '19 at 13:17
@GiftZwergrapper You are right, I was a bit too quick and it was a slip of the pen. I updated it and it works now. — Jonathan Scholbach, Oct 31 '19 at 13:23

stripping punctuation and finding unique words in Python

2 Answers2