python: class vs tuple huge memory overhead (?)

Question

I'm storing a lot of complex data in tuples/lists, but would prefer to use small wrapper classes to make the data structures easier to understand, e.g.

class Person:
    def __init__(self, first, last):
        self.first = first
        self.last = last

p = Person('foo', 'bar')
print(p.last)
...

would be preferable over

p = ['foo', 'bar']
print(p[1])
...

however there seems to be a horrible memory overhead:

l = [Person('foo', 'bar') for i in range(10000000)]
# ipython now taks 1.7 GB RAM

and

del l
l = [('foo', 'bar') for i in range(10000000)]
# now just 118 MB RAM

Why? is there any obvious alternative solution that I didn't think of?

Thanks!

(I know, in this example the 'wrapper' class looks silly. But when the data becomes more complex and nested, it is more useful)

`collections.namedtuple` seem like they are made for this purpose, but they take around `1.1GB` for your example. Not much better. — randomir, Jul 15 '17 at 22:28
Looks into `__slots__` or move to Python 3 for [key-sharing dictionary](https://www.python.org/dev/peps/pep-0412/). — Ashwini Chaudhary, Jul 15 '17 at 22:39
In the case of tuples, I believe it just references the same tuple 10 million times. When you create an object, either class or a new tuple, it uses a lot more memory — Garr Godfrey, Jul 15 '17 at 22:42
As indicated in the answers, your tuple example only creates a single tuple object. You should create a test case where you create a lot of *different* tuples vs custom objects and see how the performance is. — BrenBarn, Jul 15 '17 at 22:42
try randomizing the values, you should get a different result. — Garr Godfrey, Jul 15 '17 at 22:42
Related: [Is `namedtuple` really as efficient in memory usage as tuples? My test says NO](https://stackoverflow.com/q/41003081/846892) — Ashwini Chaudhary, Jul 15 '17 at 22:42

randomir · Accepted Answer · 2017-07-15T23:20:53.697

As others have said in their answers, you'll have to generate different objects for the comparison to make sense.

So, let's compare some approaches.

`tuple`

l = [(i, i) for i in range(10000000)]
# memory taken by Python3: 1.0 GB

`class Person`

class Person:
    def __init__(self, first, last):
        self.first = first
        self.last = last

l = [Person(i, i) for i in range(10000000)]
# memory: 2.0 GB

`namedtuple` (`tuple` + `slots`)

from collections import namedtuple
Person = namedtuple('Person', 'first last')

l = [Person(i, i) for i in range(10000000)]
# memory: 1.1 GB

namedtuple is basically a class that extends tuple and uses __slots__ for all named fields, but it adds fields getters and some other helper methods (you can see the exact code generated if called with verbose=True).

`class Person` + `slots`

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

l = [Person(i, i) for i in range(10000000)]
# memory: 0.9 GB

This is a trimmed-down version of namedtuple above. A clear winner, even better than pure tuples.

Thanks for the nice overview! I case anyone wonders how 2*10M integers can take up 1000M of memory, this seems to be due to the containing list + references: `import numpy as np` `l = np.array([(i, i) for i in range(10000000)])` will only take 189MB (after taking 1GB for a short time during construction). This doesn't work with the class instances though (references?). — seb314, Jul 16 '17 at 12:06
Actually, `np.array([(i, i) for i in range(10000000)])` will create a homogeneous 2-D array, `10000000x2`, of `dtype('int64')`, meaning the size of such array is `~ 8 x N_elem` bytes, or in this case `~160 MB`. — randomir, Jul 16 '17 at 14:24

score 6 · Answer 2 · edited Dec 18 '17 at 14:30

6

Using __slots__ decreases the memory footprint quite a bit (from 1.7 GB to 625 MB in my test), since each instance no longer needs to hold a dict to store the attributes.

class Person:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

The drawback is that you can no longer add attributes to an instance after it is created; the class only provides memory for the attributes listed in the __slots__ attribute.

edited Dec 18 '17 at 14:30

Arnaud P

12,022
7
56
67

answered Jul 15 '17 at 22:39

chepner

497,756
71
530
681

1

I've corrected what I thought was a 'typo' in you answer, please rollback with my apologies if it wasn't. – Arnaud P Dec 18 '17 at 14:31
1

No, the correction was valid. It's the instance of `Person` to which you can no longer add new attributes. You probably can't added attributes to `first` or `last`, either, but for entirely different reasons :) – chepner Dec 18 '17 at 15:35

intellimath · Answer 3 · 2021-06-22T09:50:30.813

There is yet another way to reduce the amount of memory occupied by objects by turning off support for cyclic garbage collection in addition to turning off __dict__ and __weakref__. It is implemented in the library recordclass:

$ pip install recordclass

>>> import sys
>>> from recordclass import dataobject, make_dataclass

Create the class:

class Person(dataobject):
   first:str
   last:str

or

>>> Person = make_dataclass('Person', 'first last')

As result (python 3.9, 64 bit):

>>> print(sys.getsizeof(Person(100,100)))
32

For __slot__ based class we have (python 3.9, 64 bit):

class PersonSlots:
    __slots__ = ['first', 'last']
    def __init__(self, first, last):
        self.first = first
        self.last = last

>>> print(sys.getsizeof(Person(100,100)))
48

As a result more saving of memory is possible.

For dataobject-based:

l = [Person(i, i) for i in range(10000000)]
memory size: 409 Mb

For __slots__-based:

  l = [PersonSlots(i, i) for i in range(10000000)]
  memory size: 569 Mb

score -1 · Answer 4 · answered Jul 15 '17 at 22:41

-1

In your second example, you only create one object, because tuples are constants.

>>> l = [('foo', 'bar') for i in range(10000000)]
>>> id(l[0])
4330463176
>>> id(l[1])
4330463176

Classes have the overhead, that the attributes are saved in a dictionary. Therefore namedtuples needs only half the memory.

answered Jul 15 '17 at 22:41

Daniel

42,087
4
55
81

While it's true that tuples are constants, that doesn't explain the difference here. `[tuple(['foo', 'bar']) for i in range(N)]` creates N constant (but distinct) tuple objects. – vaultah Jul 15 '17 at 22:57
I didn't downvote, but the reason is not simply because "tuples are constant". It basically a CPython optimization that works on some kind of tuple literals, for example `(1, 2 , 3/1)` won't result in same ID in CPython 2, because 3/1 can't be constant folded in CPython 2. – Ashwini Chaudhary Jul 15 '17 at 22:59

python: class vs tuple huge memory overhead (?)

4 Answers4

`tuple`

`class Person`

`namedtuple` (`tuple` + `slots`)

`class Person` + `slots`

Linked

python: class vs tuple huge memory overhead (?)

4 Answers4

tuple

class Person

namedtuple (tuple + __slots__)

class Person + __slots__

Linked

`tuple`

`class Person`

`namedtuple` (`tuple` + `slots`)

`class Person` + `slots`