89

I have an array of objects of this class

class CancerDataEntity(Model):

    age = columns.Text(primary_key=True)
    gender = columns.Text(primary_key=True)
    cancer = columns.Text(primary_key=True)
    deaths = columns.Integer()
    ...

When printed, array looks like this

[CancerDataEntity(age=u'80-85+', gender=u'Female', cancer=u'All cancers (C00-97,B21)', deaths=15306), CancerDataEntity(...

I want to convert this to a data frame so I can play with it in a more suitable way to me - to aggregate, count, sum and similar. How I wish this data frame to look, would be something like this:

     age     gender     cancer     deaths
0    80-85+  Female     ...        15306
1    ...

Is there a way to achieve this using numpy/pandas easily, without manually processing the input array?

ezamur
  • 2,064
  • 2
  • 22
  • 39

6 Answers6

107

A much cleaner way to to this is to define a to_dict method on your class and then use pandas.DataFrame.from_records

class Signal(object):
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def to_dict(self):
        return {
            'x': self.x,
            'y': self.y,
        }

e.g.

In [87]: signals = [Signal(3, 9), Signal(4, 16)]

In [88]: pandas.DataFrame.from_records([s.to_dict() for s in signals])
Out[88]:
   x   y
0  3   9
1  4  16
OregonTrail
  • 8,594
  • 7
  • 43
  • 58
  • 2
    Great answer! Note, however, that I get the same results without using `from_records`: `pandas.DataFrame([s.to_dict() for s in signals])` – ChaimG Mar 17 '17 at 05:37
  • 31
    For simple classes without any `__dict__` trickery, this can be simplified to `pandas.DataFrame([vars(s) for s in signals])` without implementing a custom `to_dict` function. – Jim Hunziker Mar 09 '18 at 15:59
58

Just use:

DataFrame([o.__dict__ for o in my_objs])

Full example:

import pandas as pd

# define some class
class SomeThing:
    def __init__(self, x, y):
        self.x, self.y = x, y

# make an array of the class objects
things = [SomeThing(1,2), SomeThing(3,4), SomeThing(4,5)]

# fill dataframe with one row per object, one attribute per column
df = pd.DataFrame([t.__dict__ for t in things ])

print(df)

This prints:

   x  y
0  1  2
1  3  4
2  4  5
Shital Shah
  • 63,284
  • 17
  • 238
  • 185
  • This works great except it seems it doesn't work exactly well with inherited classes. I tried to build a collection of objects that have an inherited base class, and the only attributes returned in the data frame are those from the parent class, not the child class, even though all members of the collection are from the child class. – Mark Jan 05 '21 at 13:17
  • perfect, event works fine if another object inside SomeThing – Levin Sep 18 '21 at 08:28
40

I would like to emphasize Jim Hunziker's comment.

pandas.DataFrame([vars(s) for s in signals])

It is far easier to write, less error-prone and you don't have to change the to_dict() function every time you add a new attribute.

If you want the freedom to choose which attributes to keep, the columns parameter could be used.

pandas.DataFrame([vars(s) for s in signals], columns=['x', 'y'])

The downside is that it won't work for complex attributes, though that should rarely be the case.

typhon04
  • 2,350
  • 25
  • 22
  • You are the man. This is the absolute best one-liner solution searching many threads for a solution! – Andrej Aug 09 '20 at 16:49
  • Its good, but what about if, dataclass contain other dataclass instance? E.g. `class A: a:int; b:int` ... `class B: a:A; c:float` and you want `pd.DataFrame(..., clolumns=["a", "b", "c"])`. – Jan Mar 23 '23 at 21:27
27

Code that leads to desired result:

variables = arr[0].keys()
df = pd.DataFrame([[getattr(i,j) for j in variables] for i in arr], columns = variables)

Thanks to @Serbitar for pointing me to the right direction.

ezamur
  • 2,064
  • 2
  • 22
  • 39
  • 1
    This will break if arr is an empty list. @typhon04's answer returns an empty dataframe for an empty arr – esantix Sep 26 '22 at 20:31
13

try:

variables = list(array[0].keys())
dataframe = pandas.DataFrame([[getattr(i,j) for j in variables] for i in array], columns = variables)
Serbitar
  • 2,134
  • 19
  • 25
  • 2
    http://meta.stackoverflow.com/questions/262695/new-answer-deletion-option-code-only-answer – ivan_pozdeev Jan 25 '16 at 17:42
  • I guess I should not accept the answer as true since I had to tweak it to make it work but I am upvoting it since it pointed me to the right direction. – ezamur Jan 25 '16 at 20:52
2

For anyone working with Python3.7+ dataclasses, this can be done very elegantly using built-in asdict; based on OregonTrail's example:

from dataclasses import dataclass, asdict

@dataclass
class Signal:
  x: float
  y: float

signals = [Signal(3, 9), Signal(4, 16)]
pandas.DataFrame.from_records([asdict(s) for s in signals])

This yields the correct DataFrame without the need for any custom methods, dunder methods, barebones vars nor getattr:

   x   y
0  3   9
1  4  16