I was just learning about encoding strings in python and after fidgeting with it a little, I got confused by the fact that the size of an empty string ('') is 0 in utf 8 and ascii but somehow 2 in utf 16? how come?
print(len(''.encode('utf16'))) # is 2
print(len(''.encode('utf8'))) # is 0
I guess a big part of the problem is that I don't understand how utf 16 works. I don't understand why encoding 'spam' in utf 16 would be 10 bytes long instead of just 8 bytes (2 bytes (16 bits) for each character). I'm assuming that the 2 bytes are needed in utf 16 as default for any string for padding or something?
*edit
I am NOT confused about the basics of how UTF 8 or UTF 16 work and differ in storing each individual characters. I am confused about how the absence of any characters (an empty string) would be stored in 2 bytes in UTF 16 but have 0 bytes in UTF 8. (as opposed to 1 byte or 0 for both)
The link does not provide answer to my question.