5

Example:

import sys

class Test():
    def __init__(self):
        self.a = 'a'
        self.b = 'b'
        self.c = 'c'
        self.d = 'd'
        self.e = 'e'

if __name__ == '__main__':
    test = [Test() for i in range(100000)]
    print(sys.getsizeof(test))

In windows task manager: I am getting a jump of ~20 MB when creating a list of 100000 vs 10.

Using sys.getsizeoff(): For a list of 100000, I get 412,236 bytes; for a list of 10, I get 100 bytes.

This seems hugely disproportionate. Why is this happening?

Jeff
  • 968
  • 1
  • 10
  • 25
  • You can use iterators (`xrange` in this case) to save memory – neoascetic Jul 31 '12 at 23:02
  • 20MB by what metric? Private bytes? Virtual size? – Daniel DiPaolo Jul 31 '12 at 23:03
  • @Daniel DiPaolo: Windows task manager defines it as: Memory (Private Working Virtual Set). (And its more than my internet explorer is using right now!) – Jeff Jul 31 '12 at 23:06
  • @neoascetic: True, but this doesn't help account for the memory usage (I tried both) – Jeff Jul 31 '12 at 23:07
  • @Jeff yeah I see the same thing with both numbers actually, and a `gc.collect()` actually fixes it right up if I `del` the object. Doesn't really answer your question but I guess it confirms that it's related to that object. – Daniel DiPaolo Jul 31 '12 at 23:08
  • 1
    `sys.getsizeof` returns a shallow size: it doesn't include the objects contained in the list. – interjay Jul 31 '12 at 23:10
  • @interjay not sure about that since `test` is a `list` and an empty list is not 400k bytes whereas one with 100k of these `Test` items in it is – Daniel DiPaolo Jul 31 '12 at 23:11
  • 1
    @DanielDiPaolo: If you're not sure, read the documentation. The size includes the memory allocated by the list itself (which contains pointers) but not the objects pointed to. – interjay Jul 31 '12 at 23:13
  • @Jeff: sorry for the partial edit revert, but it changed the question to the point where neither answer would have made sense. I kept your attribution to interjay. Otherwise we might have to close the question as not making sense. – ninjagecko Jul 31 '12 at 23:54
  • @ninjagecko: Thanks. I just edited the question to reflect the new title. – Jeff Jul 31 '12 at 23:55
  • @Jeff: last edit seems just fine! Take care. – ninjagecko Jul 31 '12 at 23:56

2 Answers2

4

The memory assigned is not disproportional; you are creating 100,000 objects! As you can see, they take up roughly 34 megabytes of space:

>>> sys.getsizeof(Test())+sys.getsizeof(Test().__dict__)
344
>>> (sys.getsizeof(Test())+sys.getsizeof(Test().__dict__)) * 1000000 / 10**6
34.4 #megabytes

You can get a minor improvement with __slots__, but you will still need about 20MB of memory to store those 100,000 objects.

>>> sys.getsizeof(Test2())+sys.getsizeof(Test2().__slots__)
200
>>> sys.getsizeof(Test2())+sys.getsizeof(Test2().__slots__) * 1000000 / 10**6
20.0 #megabytes

(With credit to mensi's answer, sys.getsizeof is not taking into account references. You can autocomplete to see most of the attributes of an object.)

See SO answer: Usage of __slots__? http://docs.python.org/release/2.5.2/ref/slots.html

To use __slots__:

class Test2():
    __slots__ = ['a','b','c','d','e']

    def __init__(self):
        ...
Community
  • 1
  • 1
ninjagecko
  • 88,546
  • 24
  • 137
  • 145
  • This has nothing to do with slots, it has to do with the memory usage of the Python interpreter when allocating objects. Slots are a way to reduce that usage, but he's not using them here. – Daniel DiPaolo Jul 31 '12 at 23:09
  • @ninjagecko: I tried adding the `__slots__` line of code into the class definition, but it seems to make little if any difference in memory usage. (Py 2.7) – Jeff Jul 31 '12 at 23:15
  • @Jeff: are you measuring this with the same thing you used to measure "20MB", or with `sys.getsizeof`? – ninjagecko Jul 31 '12 at 23:35
  • @ninjagecko: I stand corrected as far as sys.getsizeof is concerned. But it does not resolve the issue of how much ram it is gobbling, according to task manager (which remains about the same with or without `__slots__`) – Jeff Jul 31 '12 at 23:37
  • @Jeff: there's nothing weird going on here: it's math. Your 100000 objects take up in total roughly 20 megabytes of space. See mensi's answer or this revised answer. (Daniel DiPaolo is incorrect.) You should edit your answer's title to be "sys.getsizeof does adequately account for most objects"; the memory assigned is not disproportional. – ninjagecko Jul 31 '12 at 23:44
  • @ninjagecko my comment at the time wasn't incorrect in the context of your hasty and incomplete answer, which you have since completed with the relevant portion – Daniel DiPaolo Aug 02 '12 at 23:21
1

Every instance references a dict for it's __dict__ which is 272 bytes on my machine for your example. Multiply that by 100'000.

mensi
  • 9,580
  • 2
  • 34
  • 43