How Strings are converted into bytes in Python and Encoding?

Question

Recently I found the following code for a specific problem on a coding platform.

exec(bytes('㵔湩異ੴ㵦慬扭慤洺灡椨瑮听⤨献汰瑩⤨਩ⱸ㵹⡦਩㵤慛獢砨稭⬩扡⡳⵹⥴潦⁲ⱺ⁴湩汛獩⡴⡦⤩潦⁲⁩湩‧⨧湩⡴⡔⤩嵝瀊楲瑮搨椮摮硥洨湩搨⤩ㄫ洬湩搨⤩','u16')[2:])

I was completely shocked when I saw it the first time.

Then I tried figuring out what is happening there.

I converted that code into a simplified format and I found this.

>>> bytes('㵔湩異ੴ㵦慬扭慤洺灡椨瑮听⤨献汰瑩⤨਩ⱸ㵹⡦਩㵤慛獢砨稭⬩扡⡳⵹⥴潦⁲ⱺ⁴湩汛獩⡴⡦⤩潦⁲⁩湩‧⨧湩⡴⡔⤩嵝瀊楲瑮搨椮摮硥洨湩搨⤩ㄫ洬湩搨⤩', 'u16')
b"\xff\xfeT=input\nf=lambda:map(int,T().split())\nx,y=f()\nd=[abs(x-z)+abs(y-t)for z,t in[list(f())for i in' '*int(T())]]\nprint(d.index(min(d))+1,min(d))"

However, I could not find how that raw string was made.

Can somebody help me in finding out?

The raw string was made by splitting the input (i.e. the code) into two-byte blocks, interpreting each of those as a 16 bit integer, and then looking up the Unicode character that corresponds to this number (i.e. the Unicode code-point in the UTF-16 encoding, which uses 2 bytes per character). The Unicode characters with the larger code-points usually end up being from the eastern/CJK character ranges. The resulting string looks like `㵔湩異...`. When reversing the process, `㵔` corresponds to the the UTF-16 byte order mark `\xff\xfe`, `湩` to `T=`, `異` to `in`, and so on. — Tomalak, Jun 17 '21 at 14:58

How Strings are converted into bytes in Python and Encoding?

0 Answers0