0

I don't know how python deals with unicode type in the memory. I've already known a str string 'xxx' contains bytes encoded by some encoding like utf-8, while a unicode string u'xxx' contains abstract characters represented by code points. But how is a unicode string stored in main memory?

To be more explicitly, the sys.getsizeof() method will get these results for str and unicode:

sys.getsizeof('')
37
sys.getsizeof('1')
38
sys.getsizeof('1234')
41

sys.getsizeof(u'')
50
sys.getsizeof(u'1')
52
sys.getsizeof(u'1234')
58
sys.getsizeof(u'1好')
54

It's obvious the size of str depends on which encoding it uses. But as for unicode, it seems one unicode character always takes 2 bytes of memory. So how are these unicode characters stored in main memory?

Any help would be appreciated.

Splash
  • 107
  • 7

0 Answers0