I am trying to remove the punctuation from my nested and tokenized list. I have tried several different approaches to this, but to no avail. My most recent attempt looks like this:
def tokenizeNestedList(listToTokenize):
flat_list = [item.lower() for sublist in paragraphs_no_guten for item in sublist]
tokenList = []
for sentence in flat_list:
sentence.translate(str.maketrans(",",string.punctuation))
tokenList.append(nltk.word_tokenize(sentence))
return tokenList
As you can see I'm trying to remove the punctuation as i tokenize the list, the list is being traversed anywho whilst calling my function. However, when trying this approach I get the error
ValueError: the first two maketrans arguments must have equal length
Which I sort of understand why happens. Running my code without trying to remove punctuation and printing the first 10 elements gives me (so you have an idea of what I'm working on) this:
[[], ['title', ':', 'an', 'inquiry', 'into', 'the', 'nature', 'and', 'causes', 'of', 'the', 'wealth', 'of', 'nations'], ['author', ':', 'adam', 'smith'], ['posting', 'date', ':', 'february', '28', ',', '2009', '[', 'ebook', '#', '3300', ']'], ['release', 'date', ':', 'april', ',', '2002'], ['[', 'last', 'updated', ':', 'june', '5', ',', '2011', ']'], ['language', ':', 'english'], [], [], ['produced', 'by', 'colin', 'muir']]
Any and all advice appreciated.