Pandas DataFrame from Dictionary with Lists

Question

I have an API that returns a single row of data as a Python dictionary. Most of the keys have a single value, but some of the keys have values that are lists (or even lists-of-lists or lists-of-dictionaries).

When I throw the dictionary into pd.DataFrame to try to convert it to a pandas DataFrame, it throws a "Arrays must be the same length" error. This is because it cannot process the keys which have multiple values (i.e. the keys which have values of lists).

How do I get pandas to treat the lists as 'single values'?

As a hypothetical example:

data = { 'building': 'White House', 'DC?': True,
         'occupants': ['Barack', 'Michelle', 'Sasha', 'Malia'] }

I want to turn it into a DataFrame like this:

ix   building         DC?      occupants
0    'White House'    True     ['Barack', 'Michelle', 'Sasha', 'Malia']

In general, yes. In the hypothetical example, the 'building' will always be a single string, and the 'DC?' will always be a single boolean. But the length of the 'occupants' list may change depending on the building that is queried. Does that answer your question? — Conway, Nov 03 '15 at 17:01
correct, so i was thinking, create a blank dataframe first and then keep adding rows to it. however, note that as Andy points out, this may be ineffcient — AbtPst, Nov 03 '15 at 17:03

score 33 · Accepted Answer · answered Nov 03 '15 at 16:50

33

This works if you pass a list (of rows):

In [11]: pd.DataFrame(data)
Out[11]:
    DC?     building occupants
0  True  White House    Barack
1  True  White House  Michelle
2  True  White House     Sasha
3  True  White House     Malia

In [12]: pd.DataFrame([data])
Out[12]:
    DC?     building                         occupants
0  True  White House  [Barack, Michelle, Sasha, Malia]

answered Nov 03 '15 at 16:50

Andy Hayden

359,921
101
625
535

1

This solution also works with lists-of-lists and lists-of-dictionaries. `data = { 'building': 'White House', 'DC?': True, 'occupants': ['Barack', 'Michelle', 'Sasha', 'Malia'] , 'list_of_lists': [[1,2,3], [4,5,6]], 'list_of_dicts': [{'a': 1, 'b': 2}, {'c': 3, 'd': 4}]}` – Alexander Nov 03 '15 at 17:10
It did not work for me where I have 3 lists of the same length for each dictionary key.```nutrients_totals_dict = { "Name" : cnf_nutrient_names_all, "Value": ["0" for i in range(num_values)], "Units": cnf_nutrient_units_all }``` – mLstudent33 Nov 07 '20 at 05:29

score 6 · Answer 2 · answered Nov 03 '15 at 17:41

6

This turns out to be very trivial in the end

data = { 'building': 'White House', 'DC?': True, 'occupants': ['Barack', 'Michelle', 'Sasha', 'Malia'] }
df = pandas.DataFrame([data])
print df

Which results in:

    DC?     building                         occupants
0  True  White House  [Barack, Michelle, Sasha, Malia]

answered Nov 03 '15 at 17:41

Chinmay Kanchi

62,729
22
87
114

This works for python 3.x. The pandas from_dict behaviour seems to have changed. – BenP Dec 27 '18 at 12:29
Hi @ChinmayKanchi, what if there are more than one lines of such data, eg. like 2nd house with another list of [names x multiple] ? – Xavier Sun Jun 02 '20 at 02:27

BSalita · Answer 3 · 2021-05-09T16:35:40.030

2

Solution to make dataframe from dictionary of lists where keys become a sorted index and column names are provided. Good for creating dataframes from scraped html tables.

d = { 'B':[10,11], 'A':[20,21] }
df = pd.DataFrame(d.values(),columns=['C1','C2'],index=d.keys()).sort_index()
df

    C1  C2
A   20  21
B   10  11

edited May 09 '21 at 16:35

answered May 09 '21 at 16:26

BSalita

8,420
10
51
68

score 1 · Answer 4 · answered Nov 03 '15 at 16:50

Would it be acceptable if instead of having one entry with a list of occupants, you had individual entries for each occupant? If so you could just do

n = len(data['occupants'])
for key, val in data.items():
    if key != 'occupants':
        data[key] = n*[val]

EDIT: Actually, I'm getting this behavior in pandas (i.e. just with pd.DataFrame(data)) even without this pre-processing. What version are you using?

Thor · Answer 5 · 2021-06-17T15:59:54.933

I had a closely related problem, but my data structure was a multi-level dictionary with lists in the second level dictionary:

result = {'hamster': {'confidence': 1, 'ids': ['id1', 'id2']},
          'zombie': {'confidence': 1, 'ids': ['id3']}}

When importing this with pd.DataFrame([result]), I end up with columns named hamster and zombie. The (for me) correct import would be to have these as row titles, and confidence and ids as column titles. To achieve this, I used pd.DataFrame.from_dict:

In [42]: pd.DataFrame.from_dict(result, orient="index")
Out[42]:
         confidence         ids
hamster           1  [id1, id2]
zombie            1       [id3]

This works for me with python 3.8 + pandas 1.2.3.

score -1 · Answer 6 · answered Nov 03 '15 at 16:47

-1

if you know the keys of the dictionary beforehand, why not first create an empty data frame and then keep adding rows?

answered Nov 03 '15 at 16:47

AbtPst

7,778
17
91
172

2

This is not very efficient as pandas creates a new copy for every insertion (so building up a DataFrame in this way is O(n^2) in number of rows). – Andy Hayden Nov 03 '15 at 16:52

Pandas DataFrame from Dictionary with Lists

6 Answers6

Linked