I am trying to bring in a file with a bunch of text with em dashes and/or en dashes, these are not to be confused with the regular hyphen (minus sign). The problem is that every time I read in this CSV, the dashes are turned into the replacement character (�). If I try to encode or decode the file I just get error messages about how utf-8 doesn't recognize the dashes. Do I just try to write to the CSV file from python? This just seems like a really dumb problem that should be easy to fix.
My code is:
df = pd.read_csv('csv file with em dash or en dash')
print(df)
My output is:
col_name
� �
I have tried replacing the dashes after it has been read in but that isn't working. I have also tried replacing the replacement character, but that hasn't worked either. My ideal solution would that the dashes would just show up how they are in the CSV file. I think is has something to do with how the file is being read into python but whenever I try an encoder/decoder, I just get errors that the dashes aren't supported.