I scrawled down the data and had to save the dataframe as utf-16 (Unicode) since the Latin/Spanish words were shown weird in the form of utf-8. I used the following code to save the dataframe:
df.to_csv("blogdata.csv", encoding = "utf-16", sep = "\t", index = False)
when I try to read the file to clean the data using the following code:
blogdata = pd.read_csv('c:/Users/hyoungm?Downloads/blogdata.csv')
it shows the following error.
UnicodeDecodeError Traceback (most recent call last) in () ----> 1 blogdata = pd.read_csv('C:/Users/hyoungm/Downloads/blogdata.csv')
...
pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader.cinit()
pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._get_header()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
Please see my screenshot here:
I don't know either how to save the original data without losing those Laint/Spanish words within English sentences or how to read Unicode data file. Can anybody please help me with solving this issue?
Thank you very much!