Process mixed bytes data into python list

Question

I am reading data remote .dat files for EDI data processing.

Original Data is some string bytes:

b'MDA1MDtWMjAxOS44LjAuMDtWMjAxOS44LjAuMDsyMDIwMD.........'

Used decode as below...

byte_data = base64.b64decode(byte_data)

Gave me this below byte data. Is there a better way to process below bytes data into python list ?

b"0050;V2019.8.0.0;V2019.8.0.0;20200407;184821\r\n0070;;7;0;7\r\n0080;11;50;bot.pdf;Driss;C:\\Dat\\Abl\\\r\n0090;1;Z;Zub\xf6r;0;0;0;Zub\xf6r;;;Zub\xf6r\r\n

Tried decode with uft-8, didn't work.

byte_data.decode('utf-8')

Tired to convert to string and read as CSV but did not help, landed on original data. Need to keep some of the string as it is and convert \xf6r \r \n

data = io.StringIO(above_data)
data.seek(0)
csv_reader = csv.reader(data, delimiter=";")

score 2 · Accepted Answer · answered Apr 26 '22 at 09:54

2

It didn't work with 'utf-8' because it's not 'utf-8', it's probably 'ISO-8859-1' (latin-1)

text = byte_data.decode('ISO-8859-1')

because \xf6 is ö in 'ISO-8859-1'

answered Apr 26 '22 at 09:54

mugiseyebrows

4,138
1
14
15

Must be some lag on my browser!! You'd answered it ages before I posted and it wasn't there till about 2 seconds before I submitted!. Anyway, yours is indeed the correct answer. – Amiga500 Apr 26 '22 at 10:02

score 1 · Answer 2 · answered Apr 26 '22 at 10:00

1

Is it definitely utf-8 encoded?

This might help guide to what decoder to use:

import chardet
print(cardet.detect(byte_data))

answered Apr 26 '22 at 10:00

Amiga500

1,258
1
6
11

It was not... it was latin-1 as mugis suggested. I was about to ask is there a way to detect encoding format. Thanks – NinjaBat Apr 26 '22 at 10:02

Process mixed bytes data into python list

2 Answers2