I don't know how python deals with unicode type in the memory. I've already known a str string 'xxx' contains bytes encoded by some encoding like utf-8, while a unicode string u'xxx' contains abstract characters represented by code points. But how is a unicode string stored in main memory?
To be more explicitly, the sys.getsizeof() method will get these results for str and unicode:
sys.getsizeof('')
37
sys.getsizeof('1')
38
sys.getsizeof('1234')
41
sys.getsizeof(u'')
50
sys.getsizeof(u'1')
52
sys.getsizeof(u'1234')
58
sys.getsizeof(u'1好')
54
It's obvious the size of str depends on which encoding it uses. But as for unicode, it seems one unicode character always takes 2 bytes of memory. So how are these unicode characters stored in main memory?
Any help would be appreciated.