5

enter image description here

As seen in the picture. 50 000 000 records only take 404M memory, why? Since one record takes 83 Bytes, 50 000 000 records should take 3967M memory.

>>> import sys
>>> a=[]
>>> for it in range(5*10**7):a.append("miJ8ZNFG9iFqiQQohvyTWwqsij2rJCiZ7v"+str(it))
... 
>>> print(sys.getsizeof(a)/1024**2)
404.4306411743164
>>> print(sys.getsizeof("miJ8ZNFG9iFqiQQohvyTWwqsij2rJCiZ7v"))
83
>>> print(83*5*10**7/1024**2)
3957.7484130859375
>>> 
iBug
  • 35,554
  • 7
  • 89
  • 134
purplecity
  • 223
  • 1
  • 5
  • Someone else had a similar query as you did but went a bit further, so this is more of a related thread: [Deep version of sys.getsizeof](https://stackoverflow.com/questions/14208410/deep-version-of-sys-getsizeof) – metatoaster Jan 17 '19 at 03:08

1 Answers1

5

sys.getsizeof only reports the cost of the list itself, not its contents. So you're seeing the cost of storing the list object header, plus (a little over) 50M pointers; you're likely on a 64 bit (eight byte) pointer system, thus storage for 50M pointers is ~400 MB. Getting the true size would require sys.getsizeof to be called for each object, each object's __dict__ (if applicable), etc., recursively, and it won't be 100% accurate since some of the objects (e.g. small ints) are likely shared; this is not a rabbit hole you want to go down.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • yes. 64bit. I care about the actual storage(obtain list contents). so it mean if a list that have 50 000 000 records. actually all related storage is 3957 + 404 M? – purplecity Jan 17 '19 at 03:14
  • @purplecity: Well, your records are 83 bytes, plus the length of the stringified `int` you're adding on, so a bit larger than that, more like 4339 + 404 M, but yes, that's roughly correct. – ShadowRanger Jan 17 '19 at 03:58