0

Not sure if I'm missing something obvious, but why are 2 equivalent objects (in this case strings) different memory sizes in python2 vs. python3?

e.g.

Python 2.7.12 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import sys
>>> sys.getsizeof('Hello World')
48

Python 3.6.0 |Continuum Analytics, Inc.| (default, Dec 23 2016, 12:22:00)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getsizeof('Hello World')
60
Joe Healey
  • 1,232
  • 3
  • 15
  • 34
  • 2
    Because in Python-3.x strings can represent all *unicode* characters, so since it can contain more data, the representation is typically a bit larger. – Willem Van Onsem Jul 10 '18 at 19:29
  • 2
    The short answer--Python 3 is significantly different from Python 2. String implementation due to different ways of handling Unicode is one way, long integers by default are another, and so on. – Rory Daulton Jul 10 '18 at 19:30
  • 2
    You are *not comparing equivalent objects*. You are comparing a Python 2 bytestring with a Python 3 Unicode string. At the very least you'd want to compare `sys.getsizeof(u'Hello World')` (Python 2.7: 72, Python 3.6: 60) to use Unicode strings in both places, or `sys.getsizeof(b'Hello World')` (Python 2.7: 48, Python 3.6: 44) to use byte strings in both. – Martijn Pieters Jul 10 '18 at 19:31
  • Or in Python 3, you can check `sys.getsizeof(b'Hello World')` and get 44 back. – Calum You Jul 10 '18 at 19:32
  • I see. Under my Py2.7 install I get the unicode string size as 96 (exactly double), rather than the 72 you got @MartijnPieters. Is this significant or just down to particular locales/distributions/platforms etc.? – Joe Healey Jul 10 '18 at 19:42
  • @JoeHealey: you probably have a UCS4 build in that case (`sys.maxunicode` is then 1114111, rather than 65535 on a UCS2 build). – Martijn Pieters Jul 10 '18 at 19:44
  • @CalumYou: exact sizes are platform specific (32-bit vs 64-bit, OS-specific choices for certain C constants, etc.). – Martijn Pieters Jul 10 '18 at 19:47

0 Answers0