I need to remove punctuation from a unicode string. I've read a few posts and the most recommended one was this one.
I've implemented the following:
table = dict.fromkeys(i for i in range(sys.maxunicode) if unicodedata.category(chr(i)).startswith('P'))
def tokenize(message):
message = unicode(message,'utf-8').lower()
#print message
message = remove_punctuation_unicode(message)
return message
def remove_punctuation_unicode(string):
return string.translate(table)
But when I run the code, this error pops up:
table = dict.fromkeys(i for i in range(sys.maxunicode) if unicodedata.category(chr(i)).startswith('P'))
TypeError: must be unicode, not str
I can't quite figure it out what to do. Can someone tell me how to fix this?