Converting a list of objects to a pandas dataframe

Question

How do I convert a list of objects to a pandas dataframe?

class Person(object):
    def __init__(self):
        self.name = ""
        self.year = 0
        self.salary = 0

For example below works but I want to have a list of person classes instead

import pandas as pd
import numpy as np

data = {'name': ['Alice', 'Bob', 'Charles', 'David', 'Eric'],
    'year': [2017, 2017, 2017, 2017, 2017],
    'salary': [40000, 24000, 31000, 20000, 30000]}

df = pd.DataFrame(data, index = ['Acme', 'Acme', 'Bilbao', 'Bilbao', 'Bilbao'])

print(df)

Like this? question is unclear I think: `data = {'persons': [Person() for _ in range(5)]}` — Anton vBR, Dec 03 '17 at 20:58
Use a list comprehension? `data = [{'name': person.name, 'year': person.year, 'salary': person.salary} for person in person_list]` — ayhan, Dec 03 '17 at 20:58
Sorry for the confusion. I have a list of person objects and I want to create a dataframe from that such that the data frames columns are the person's attributes. How can I do that? — im281, Dec 03 '17 at 22:41

score 32 · Accepted Answer · answered Dec 04 '17 at 01:12

32

Sort of a combination of ayhan's suggestion and what you seem to want -- you can add a method to your Person class that transforms it into something that fits the Pandas DataFrame constructor.

class Person(object):
    def __init__(self, name='', year=0, salary=0):
        self.name = name
        self.year = year
        self.salary = salary

    def as_dict(self):
        return {'name': self.name, 'year': self.year, 'salary': self.salary}

person1 = Person('john', 2017, 100)
person2 = Person('smith', 2016, 200)
person3 = Person('roger', 2016, 500)

person_list = [person1, person2, person3]

df = pd.DataFrame([x.as_dict() for x in person_list])

print(df)

    name    salary  year
0   john    100     2017
1   smith   200     2016
2   roger   500     2016

answered Dec 04 '17 at 01:12

Ido S

1,304
10
11

Why are the column orders different? It should be: – im281 Dec 04 '17 at 01:21
how do you set the column orders? The name should be the first column, year should be in the middle and salary last – im281 Dec 04 '17 at 01:22
pass the ````columns```` arg to the constructor: ````df=pd.DataFrame([x.as_dict() for x in person_list], columns=['name', 'year', 'salary'])```` – Ido S Dec 04 '17 at 01:35
1

also, for completeness, you could use the built-in python ````vars```` function instead of defining ````as_dict````: [vars(x) for x in person_list]. – Ido S Dec 04 '17 at 01:41
6

as_dict() does pretty much the same as __dict__ so `pd.DataFrame([x.__dict__ for x in person_list])` is another option (well, in the Python version I tried) – stijn Nov 22 '18 at 16:21
@im281 the column order has nothing to do with the order in which the dictionary was constructed. dict is a hashing structure so the order is determined internally by its hash function implementation. one should not expect any kind of order at the surface. – axolotl Dec 24 '18 at 02:46
1

`vars(x)` instead of `x.__dict__` is even more pythonic, as noted [here](https://stackoverflow.com/questions/61517/python-dictionary-from-an-objects-fields) – Anthony Townsend Mar 27 '22 at 01:54

chrischma · Answer 2 · 2022-10-31T11:26:52.490

5

You can create a pandas dataframe from any list by using vars.

import pandas as pd

df = pd.DataFrame([vars(d) for d in data])

This works, because vars returns all properties of all objects within your list. Enjoy!

edited Oct 31 '22 at 11:26

answered May 31 '22 at 16:04

chrischma

101
1
4

Milo · Answer 3 · 2017-12-03T21:18:58.397

First of all, you should modify your __init__(), since your version just sets every attribute of any Person object to default values and does not allow for the user setting them.

You can then use the zip() function to create triples of the values in your data dictionary and then use those to create Person instances

import pandas as pd

class Person:
    def __init__(self, name='', year=0, salary=0):
         self.name = name
         self.year = year
         self.salary = salary

data = {'name': ['Alice', 'Bob', 'Charles', 'David', 'Eric'],
        'year': [2017, 2017, 2017, 2017, 2017],
        'salary': [40000, 24000, 31000, 20000, 30000]}

foo = [Person(name, year, salary) for name, year, salary in zip(data['name'], data['year'], data['salary'])]
df = pd.DataFrame(foo, index=['Acme']*2 + ['Bilbao']*3, columns=['Person'])

first_person = df['Person'].iloc[0]
print(first_person.name, first_person.year, first_person.salary)

Output:

Alice 2017 40000

Thon Deboer · Answer 4 · 2021-12-03T03:48:20.837

How about this?

This will take all the (first level) attributes and makes them into a dictionary that can be loaded directly into a Pandas DataFrame, which is what I thought OP was looking for and this avoids having to change the class.

the not attr.starswith("_") is there to avoid loading the private attributes into the Pandas DataFrame.

import pandas as pd
class Person(object):
    def __init__(self, name='', year=0, salary=0):
        self.name = name
        self.year = year
        self.salary = salary

person1 = Person('john', 2017, 100)
person2 = Person('smith', 2016, 200)
person3 = Person('roger', 2016, 500)

person_list = [person1, person2, person3]

data = [{attr: getattr(p,attr) for attr in dir(p) if not attr.startswith('_')} for p in person_list ]
df = pd.DataFrame(data)
print(df)

    name  salary  year
0   john     100  2017
1  smith     200  2016
2  roger     500  2016

You should add some more information to your solution. – Frank Dec 01 '21 at 17:49 — Frank, Dec 01 '21 at 17:49

Converting a list of objects to a pandas dataframe

4 Answers4