5

In a large pandas Dataframe, I have three columns (fruit, vegetable, and first_name). The values of these columns are lists.

From the lists, I want to create one new column with a list of dictionaries for each row of the DataFrame.

I have three columns (fruit, vegetable, and first_name) with each row having lists as their values.

First row of my dataframe:

df = pd.DataFrame({
 "fruit": [["Apple", "Banana","Pear","Grape","Pineapple"]],
 "vegetable": [["Celery","Onion","Potato","Broccoli","Sprouts"]],
 "first_name": [["Sam", "Beth", "John", "Daisy", "Jane"]]
})

How do I transform the three columns to one column and have the value look like this instead?

[
   {"fruit": "Apple", "vegetable":"Celery", "first_name":"Sam"}, 
   {"fruit": "Banana", "vegetable":"Onion", "first_name":"Beth"},
   {"fruit": "Pear", "vegetable":"Potato", "first_name":"John"},
   {"fruit": "Grape", "vegetable":"Broccoli", "first_name":"Daisy"},
   {"fruit": "Pineapple", "vegetable":"Sprouts", "first_name":"Jane"}
]
Zephyr
  • 11,891
  • 53
  • 45
  • 80
Kevin
  • 97
  • 1
  • 6

3 Answers3

8

IIUC you can do it with (1) .explode() and (2) .to_dict()

df.apply(pd.Series.explode).to_dict(orient='records')
#output:
[{'fruit': 'Apple', 'vegetable': 'Celery', 'first_name': 'Sam'},
 {'fruit': 'Banana', 'vegetable': 'Onion', 'first_name': 'Beth'},
 {'fruit': 'Pear', 'vegetable': 'Potato', 'first_name': 'John'},
 {'fruit': 'Grape', 'vegetable': 'Broccoli', 'first_name': 'Daisy'},
 {'fruit': 'Pineapple', 'vegetable': 'Sprouts', 'first_name': 'Jane'}]
Terry
  • 2,761
  • 2
  • 14
  • 28
  • 1
    Thanks I used your suggestion in combination with this to get my new column: pd.DataFrame({'party_items':df.apply(pd.Series.explode).to_dict(orient='records')}) – Kevin Jul 21 '20 at 19:15
0

You can also create the exploded DataFrame using to_dict and then calling pd.DataFrame. It will be a bit faster for smaller lists, but is essentially the same once you have 10,000+ items.

pd.DataFrame(df.iloc[0].to_dict()).to_dict('records')

[{'fruit': 'Apple', 'vegetable': 'Celery', 'first_name': 'Sam'},
 {'fruit': 'Banana', 'vegetable': 'Onion', 'first_name': 'Beth'},
 {'fruit': 'Pear', 'vegetable': 'Potato', 'first_name': 'John'},
 {'fruit': 'Grape', 'vegetable': 'Broccoli', 'first_name': 'Daisy'},
 {'fruit': 'Pineapple', 'vegetable': 'Sprouts', 'first_name': 'Jane'}]
ALollz
  • 57,915
  • 7
  • 66
  • 89
0

the major issue to take care of is to flatten the value for each value in the dictionary. A rather manual implementation is:

for i in ["fruit","vegetable","first_name"]:    
        flat_list = [item for sublist in df[i] for item in sublist]    
        list.append(flat_list)
        
        list_of_dic = [] for i in range(5):    
            dic = {}    
            dic["furit"] = list[0][i]    
            dic["vegetable"] = list[1][i]
            dic["first_name"] = list[2][i]
            list_of_dic.append(dic) 
    
       
        
Shawn
  • 11
  • 5