Let's say I have:
- A proprietary Python library that reads a file in 'Latin-1'. I can't change the way it's read.
- As a result, a
dataFrame1
is generated, where one of the values is meant to be stored as "Column€", but I can see from the debugger that it's stored as 'Column\x80'. - I need to match this text value to a
dataFrame2
(e.g. use "Column€" as a key for joining some data), and that second data frame is originally encoded in 'utf-8', e.g. "Column€". I am not able to change the input encoding here either.
Basically, I want both Data Frames to store "Col€" so that I could use it as a unique key to join my data frames.
I tried x.encode('utf-8')
but it returns 'Col?'.
Decoding like x.decode('latin1').encode('utf-8')
didn't work either (there are quite a lot of variations of it here on StackOverflow)
My gut feeling like is that there's some fundamental encoding knowledge missing.. :) What else could I try?