I want to implement a to_dict
function that behaves similarly to the built-in __dict__
attribute but allows me to have custom logic. (It is used for construct a pandas DataFrame. See the example below. )
However I find out that my to_dict
function is ~25% slower than __dict__
even when they do exactly the same thing. How can I improve my code?
class Foo:
def __init__(self, a,b,c,d):
self.a = a
self.b = b
self.c = c
self.d = d
def to_dict(self):
return {
'a':self.a,
'b':self.b,
'c':self.c,
'd':self.d,
}
list_test = [Foo(i,i,i,i)for i in range(100000)]
%%timeit
pd.DataFrame(t.to_dict() for t in list_test)
# Output: 199 ms ± 4.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
pd.DataFrame(t.__dict__ for t in list_test)
# Output: 156 ms ± 948 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
A digress to this question but related to my final goal: what is the most efficient way to construct a pandas DataFrame from a list of custom objects? My current approach is taken from https://stackoverflow.com/a/54975755/1087924