0

I am trying to receive data on a socket. It will be a mix of UTF-8 and UTF-16 depending on what is sent to me. I am trying to find a way to detect if it is UTF-8/UTF-16 but am running into a issue.

data = b"\x00D\x00E\x00S\x00K\x00T\x00O\x00P\x00-\x00\x15\x04\x19\x04\x19\x04'\x04\x13\x04\x14\x04\x14\x04\x00\x00"

def is_ascii(s):
    return all(ord(c) < 128 for c in s)

def print_to_screen(data):
    if is_ascii(str(data)):
        print("RECV 8: " + data.decode())
    else:
        print("RECV 16: " + data.decode('utf-16'))

The data should be: DESKTOP-ЕЙЙЧГДД

It is always printing as if it is UTF-8. I am not sure if I need to alter is_ascii or find another way to do what I am doing.

EDIT:

data = b"D\x00E\x00S\x00K\x00T\x00O\x00P\x00-\x00\x15\x04\x19\x04\x19\x04'\x04\x13\x04\x14\x04\x14\x04\x00\x00"

try:
    data = data.decode('utf-8')
except:
    data = data.decode('utf-16')

print(data)

It will convert half of the data which will print DESKTOP- and it won't decode the other half.

Doritos
  • 403
  • 3
  • 16
  • 1
    Attempting to run `is_ascii(data)` on my machine gives an error. Can you confirm the output of `is_ascii()` on your data string? Additionally, you have a single quote in the middle of you bytes string that is making it 2 strings. – G. Anderson May 21 '19 at 17:53
  • Possible duplicate of [How to detect string byte encoding?](https://stackoverflow.com/questions/15918314/how-to-detect-string-byte-encoding) – G. Anderson May 21 '19 at 18:03
  • https://pypi.org/project/chardet/ – thebjorn May 21 '19 at 18:07
  • Hey here is your ans https://stackoverflow.com/a/12053219/11230028 check this out – Chetan Vashisth May 22 '19 at 05:40

1 Answers1

0

Possibly you can try something like this using chardet library.

import chardet 
the_encoding = chardet.detect('string')['encoding']

and that's it!

pankaj
  • 1,004
  • 12
  • 20