1

I receive the following byte message via socket connection and I want to convert into string and do further processing I am using python3.7

below is the code i tried so far

import  codecs

a = b'0400F224648188E0801200000040000000001941678904000010237890000000000000222220418151856038556051259950760020806002468060046010403319     HSBCBSB8001101234567890MC   100  WITH ORDERIN   FO           AU009006Q\x00\x00\x00\x83\x00007\xa0\x00\x00\x00\x00%\x02010003855604181518562468000000000460100000'

b= codecs.decode(a, 'utf-8')

print(b)

Iam getting the error as below

> UnicodeDecodeError: 'utf-8' codec can't decode byte 0x83 in position    > 208: invalid start byte

how can I convert the data to string and process further

Thanks in advance

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69

1 Answers1

2

Your data is not utf-8 encoded. You can use BeautifulSoup to decode unknown encodings:

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(b'0400F224648188E0801200000040000000001941678904000010237890000000000000222220418151856038556051259950760020806002468060046010403319     HSBCBSB8001101234567890MC   100  WITH ORDERIN   FO           AU009006Q\x00\x00\x00\x83\x00007\xa0\x00\x00\x00\x00%\x02010003855604181518562468000000000460100000'
)
print(soup.contents[0])

print(soup.originalEncoding)

to get

0400F224648188E0801200000040000 ... # etc

and

windows-1252

You can use the bs4-detector seperately as well: UnicodeDammit and also provide it with suggestions which encodings to try first / not to try to finetune it.

More info on SO:

Patrick Artner
  • 50,409
  • 9
  • 43
  • 69