2

I want to implement a to_dict function that behaves similarly to the built-in __dict__ attribute but allows me to have custom logic. (It is used for construct a pandas DataFrame. See the example below. )

However I find out that my to_dict function is ~25% slower than __dict__ even when they do exactly the same thing. How can I improve my code?

class Foo:
    def __init__(self, a,b,c,d):
        self.a = a
        self.b = b
        self.c = c
        self.d = d

    def to_dict(self):
        return {
            'a':self.a,
            'b':self.b,
            'c':self.c,
            'd':self.d,
        }

list_test = [Foo(i,i,i,i)for i in range(100000)]

%%timeit
pd.DataFrame(t.to_dict() for t in list_test)
# Output: 199 ms ± 4.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
pd.DataFrame(t.__dict__ for t in list_test)
# Output: 156 ms ± 948 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

A digress to this question but related to my final goal: what is the most efficient way to construct a pandas DataFrame from a list of custom objects? My current approach is taken from https://stackoverflow.com/a/54975755/1087924

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
GoCurry
  • 899
  • 11
  • 31
  • 5
    They do not do exactly the same thing. `to_dict` creates a new object every time, whereas `t.__dict__` already exists. – wim May 16 '19 at 17:51
  • 1
    Generate a `dict` object on the creation of an instance of `Foo`, and just update it when the values are updated. Then on `to_dict`, you return the object instead of generating it at every call. Should be faster. – Mikael May 16 '19 at 17:55
  • 1
    Note, `__dict__` is *not a function*, it is simply an attribute that contains the *namespace* of that object. – juanpa.arrivillaga May 16 '19 at 18:00

1 Answers1

3

__dict__ does not “convert” an object to a dict (unlike __int__, __str__, etc), it's where the object's (writable) attributes are stored.

I think your implementation is reasonably efficient. Consider this simplified example:

import dis

class Foo:
    def __init__(self, a):
        self.a = a
    def to_dict(self):
        return {'a': self.a}

foo = Foo(1)

dis.dis(foo.to_dict)
dis.dis('foo.__dict__')

We can see that Python looks up the attributes and creates a new dict every time (plus you'd need to make a call to .to_dict, not shown here):

  7           0 LOAD_CONST               1 ('a')
              2 LOAD_FAST                0 (self)
              4 LOAD_ATTR                0 (a)
              6 BUILD_MAP                1
              8 RETURN_VALUE

while accessing an existing attribute is much simpler:

  1           0 LOAD_NAME                0 (foo)
              2 LOAD_ATTR                1 (__dict__)
              4 RETURN_VALUE

You could however store your custom representation on the instance, achieving the same exact bytecode as with __dict__, but then you need to update it correctly on all changes to Foo (which will cost some speed and memory). If updates are uncommon in your use-case, this could be an acceptable trade-off.

In your example, a simple option is to override __getattribute__, but I'm guessing Foo has other attributes, so having setters is probably going to be more convenient:

class Foo:
    def __init__(self, a):
        self.dict = {}
        self.a = a

    @property
    def a(self):
        return self._a

    @a.setter
    def a(self, value):
        self._a = value
        self.dict['a'] = value

foo = Foo(1)
print(foo.dict)  # {'a': 1}
foo.a = 10
print(foo.dict)  # {'a': 10}
Norrius
  • 7,558
  • 5
  • 40
  • 49