2

I am reading data remote .dat files for EDI data processing.

Original Data is some string bytes:

b'MDA1MDtWMjAxOS44LjAuMDtWMjAxOS44LjAuMDsyMDIwMD.........'

Used decode as below...

byte_data = base64.b64decode(byte_data)

Gave me this below byte data. Is there a better way to process below bytes data into python list ?

b"0050;V2019.8.0.0;V2019.8.0.0;20200407;184821\r\n0070;;7;0;7\r\n0080;11;50;bot.pdf;Driss;C:\\Dat\\Abl\\\r\n0090;1;Z;Zub\xf6r;0;0;0;Zub\xf6r;;;Zub\xf6r\r\n

Tried decode with uft-8, didn't work.

byte_data.decode('utf-8')

Tired to convert to string and read as CSV but did not help, landed on original data. Need to keep some of the string as it is and convert \xf6r \r \n

data = io.StringIO(above_data)
data.seek(0)
csv_reader = csv.reader(data, delimiter=";")
NinjaBat
  • 370
  • 4
  • 20

2 Answers2

2

It didn't work with 'utf-8' because it's not 'utf-8', it's probably 'ISO-8859-1' (latin-1)

text = byte_data.decode('ISO-8859-1')

because \xf6 is ö in 'ISO-8859-1'

mugiseyebrows
  • 4,138
  • 1
  • 14
  • 15
  • Must be some lag on my browser!! You'd answered it ages before I posted and it wasn't there till about 2 seconds before I submitted!. Anyway, yours is indeed the correct answer. – Amiga500 Apr 26 '22 at 10:02
1

Is it definitely utf-8 encoded?

This might help guide to what decoder to use:

import chardet
print(cardet.detect(byte_data))
Amiga500
  • 1,258
  • 1
  • 6
  • 11
  • It was not... it was latin-1 as mugis suggested. I was about to ask is there a way to detect encoding format. Thanks – NinjaBat Apr 26 '22 at 10:02