Memory use of unicode string in python

Asked Aug 05 '17 at 02:57

Active Aug 05 '17 at 02:57

Viewed 191 times

I don't know how python deals with unicode type in the memory. I've already known a str string 'xxx' contains bytes encoded by some encoding like utf-8, while a unicode string u'xxx' contains abstract characters represented by code points. But how is a unicode string stored in main memory?

To be more explicitly, the sys.getsizeof() method will get these results for str and unicode:

sys.getsizeof('')
37
sys.getsizeof('1')
38
sys.getsizeof('1234')
41

sys.getsizeof(u'')
50
sys.getsizeof(u'1')
52
sys.getsizeof(u'1234')
58
sys.getsizeof(u'1好')
54

It's obvious the size of str depends on which encoding it uses. But as for unicode, it seems one unicode character always takes 2 bytes of memory. So how are these unicode characters stored in main memory?

Any help would be appreciated.

asked Aug 05 '17 at 02:57

Splash

Have a look at this https://stackoverflow.com/questions/26079392/how-is-unicode-represented-internally-in-python – Srikanth Aug 05 '17 at 03:04
@Srikanth@John Zwinck Got it.Thanks a lot for your help! – Splash Aug 05 '17 at 03:09

Memory use of unicode string in python

0 Answers0