I've converted a column from a CSV to a list, and then a string for tokenization. After it's converted to a string I get '\n' throughout. I'm looking to either prevent that from happening completely, or remove it after it happens.
So far, I've tried replace, strip, and rstrip to no avail.
Here's a version where I tried .replace() after converting the list to a string.
df = pd.read_csv('raw_da_qs.csv')
question = df['question_only']
question = question.str.replace(r'\d+','')
question = str(question.tolist())
question = question.replace('\n','')
tokenizer = nltk.tokenize.RegexpTokenizer('\w+')
tokens = tokenizer.tokenize(question)
and I end up with tokens like this 'nthere', and 'nsuicide'