I am reading a CSV file with one column which contains text data. When I faced encoding error since the file was not in utf-8, I tried the following 2 solutions:
Solution 1:
df = pd.read_csv("data_encoded.csv", encoding = 'latin-1')
Solution 2:
I changed the encoding explicitly to utf-8 and used
df = pd.read_csv("data_encoded.csv")
Both the solutions solved the error, but I am getting garbage values. For example:
me pretty (changed to)=> me\\r\\rpretty
I noticed the "\r" appended to most of the words when I tokenized them. Is there a pythonic way to remove these.
I have implemented solutions like:
re.replace
filters based on ("\\r")
I am looking for a way to prevent the garbage values forming in the first place. Any suggestions will be helpful