This is my string:
mystring = "How’s it going?"
This is what i did:
import string
exclude = set(string.punctuation)
def strip_punctuations(mystring):
for c in string.punctuation:
new_string=''.join(ch for ch in mystring if ch not in exclude)
new_string = chat_string.replace("\xe2\x80\x99","")
new_string = chat_string.replace("\xc2\xa0\xc2\xa0","")
return chat_string
OUTPUT:
If i did not include this line new_string = chat_string.replace("\xe2\x80\x99","")
this will be the output:
'How\xe2\x80\x99s it going'
i realized exclude does not have that weird looking apostrophe in the list:
print set(exclude)
set(['!', '#', '"', '%', '$', "'", '&', ')', '(', '+', '*', '-', ',', '/', '.', ';', ':', '=', '<', '?', '>', '@', '[', ']', '\\', '_', '^', '`', '{', '}', '|', '~'])
How do i ensure all such characters are taken out instead of manually replacing them in the future?