I have a .csv file with 4 columns: 2 integer columns, 1 byte array column and a date column. This byte array column has a binary text that I need do decode to a normal utf-8 string.
Here how my .csv looks like:
id1 id2 text date
1 2 0x202020312045584D4F2841292E205 2020-01-01
3 4 0x20312020455843454C454E 2020-05-01
When I simply use pd.read_csv():
df = pd.read_csv(file_path + file_name)
output:
id1 id2 text date
24228 35649098 0x202020312045584D4F2841292E2 2020-05-04
24298 97780137 0x20312020455843454C454E54C38 2020-05-04
df.info():
id1 994 non-null int64
id2 994 non-null int64
text 994 non-null object
date 994 non-null object
However, I need the normal string, so I tried decoding only this column, but I can't make it work. Here is what I have already tried:
Trial 1:
df.loc[:,'transformedText'] = df.text.str.decode('utf-8')
output: transformedText column comes all as NaN
Trial 2:
df.loc[:,'transformedText'] = df.text.str.encode('utf-8').str.decode('utf-8')
output: transformedText column keeps the byte array string
Trial 3:
df.loc[:,'transformedText'] = df.text.str.encode('ascii').str.decode('utf-8')
output: transformedText column keeps the byte array string
In order to investigate the problem more, I checked what happend when I just encoded the string: df.loc[:,'transformedText'] = df.text.str.encode('ascii')
Output: All it does is add a b' ' on my string (e.g b'0x202020312045584D4F2841292E2')
I believe the reason the decoding doesn't work is because read_csv is not recognizing my column as a byte array column, but as a string column. Although, I am not sure about this.
The ouput that I need is:
id1 id2 text date
24228 35649098 A normal string that a human can read 1 2020-05-04
24298 97780137 A normal string that a human can read 2 2020-05-04
Also, I am kind of new with the binary files, so anything helps!
I have already checked these links bellow, but couldn't find an answer:
https://www.programiz.com/python-programming/methods/string/encode
https://www.geeksforgeeks.org/convert-binary-to-string-using-python/
https://www.kite.com/python/answers/how-to-convert-binary-to-string-in-python
Convert string to binary in python
Convert bytes to a string