How to use string.punctuation to remove punctuation in a text file

Question

Made a function to count 20 most common words in a book that I downloaded as a plain text format. The python textbook I am going off of said to use the import string and then the replace or the translate method to remove any punctuation, but when I print out the lines after the replace step, all the lines still have punctuation in it. I tried moving around the line = line.strip() and the line = line.replace(string.punctuation,'') step, but that did not work. I have never used replace so I may be using it wrong for all I know. Rest of my program works, just that step is frustrating me.

import string
def function():
    infile = open('gutbook.txt','r',encoding='utf-8')
    count = dict()
    list2 = list()
    for line in infile:
        line = line.strip()
        line = line.replace(string.punctuation,'')
        line = line.lower().split()
        if line== []:
            continue
        for i in line:
            count[i] = count.get(i,0) + 1
    for key,value in count.items():
        newtuple = (value,key)
        list2.append(newtuple)
    list3 = sorted(list2,reverse = True)
    print(list3[:20])



function()

Your bug is trying to use `line.replace(string.punctuation,'')`, which tries to find the string of *all* punctuation in that order and remove it, not *each* individual character. — ShadowRanger, Jun 01 '18 at 21:20

Rakesh · Answer 1 · 2018-06-01T21:28:32.410

0

Use Regex.

Ex:

import re
import string

text = "Hello ! #$%&'()*+,-./:;<=>?@[\]^_`{|}~ World"
print(re.sub("[" + re.escape(string.punctuation) + "]", "", text))
#or
print( re.sub(r'[^a-zA-Z0-9\s]','',text) )

edited Jun 01 '18 at 21:28

answered Jun 01 '18 at 21:03

Rakesh

81,458
17
76
113

1

You've got a subtle bug here, that can be fixed by wrapping `string.punctuation` in [`re.escape`](https://docs.python.org/3/library/re.html#re.escape), e.g. `re.sub("[" +re.escape(string.punctuation) + "]", "", text)`. Without escaping it, it won't treat `\ ` as punctuation (it interprets the `\ ` as escaping the `]` in `string.punctuation`, which prevents everything exploding, but also omits `\ ` from the set of characters to match). – ShadowRanger Jun 01 '18 at 21:25
@ShadowRanger. Thank you so much. – Rakesh Jun 01 '18 at 21:28

How to use string.punctuation to remove punctuation in a text file

1 Answers1