Pandas Dataframe split multiple key values to different columns

Question

I have a dataframe column with the following format:

col1    col2   
 A     [{'Id':42,'prices':['30',’78’]},{'Id': 44,'prices':['20','47',‘89’]}]
 B     [{'Id':47,'prices':['30',’78’]},{'Id':94,'prices':['20']},{'Id':84,'prices':['20','98']}]

How can I transform it to the following ?

col1    Id            price   
  A     42         ['30',’78’]
  A     44         ['20','47',‘89’]
  B     47         ['30',’78’]
  B     94         ['20']
  B     84         ['20','98']

I was thinking of using apply and lambda as a solution but I am not sure how.

Edit : In order to recreate this dataframe I use the following code :

data = [['A', "[{'Id':42,'prices':['30','78']},{'Id': 44,'prices':['20','47','89']}]"], 
        ['B', "[{'Id':47,'prices':['30','78']},{'Id':94,'prices':['20']},{'Id':84,'prices':['20','98']}]"]] 

df = pd.DataFrame(data, columns = ['col1', 'col2'])

kindly output ur data with to_dict – sammywemmy May 26 '20 at 10:03 — sammywemmy, May 26 '20 at 10:03

jezrael · Accepted Answer · 2020-05-26T10:43:44.767

5

Solution if there are lists in column col2:

print (type(df['col2'].iat[0]))
<class 'list'>

L = [{**{'col1': a}, **x} for a, b in df[['col1','col2']].to_numpy() for x in b]

df = pd.DataFrame(L)
print (df)
  col1  Id        prices
0    A  42      [30, 78]
1    A  44  [20, 47, 89]
2    B  47      [30, 78]
3    B  94          [20]
4    B  84      [20, 98]

If there are strings:

print (type(df['col2'].iat[0]))
<class 'str'>

import ast

L = [{**{'col1': a}, **x} for a, b in df[['col1','col2']].to_numpy() for x in ast.literal_eval(b)]
df = pd.DataFrame(L)
print (df)
  col1  Id        prices
0    A  42      [30, 78]
1    A  44  [20, 47, 89]
2    B  47      [30, 78]
3    B  94          [20]
4    B  84      [20, 98]

For better understanding is possible use:

import ast

L = []
for a, b in df[['col1','col2']].to_numpy():
    for x in ast.literal_eval(b):
        d = {'col1': a}
        out = {**d, **x}
        L.append(out)

df = pd.DataFrame(L)
print (df)
  col1  Id        prices
0    A  42      [30, 78]
1    A  44  [20, 47, 89]
2    B  47      [30, 78]
3    B  94          [20]
4    B  84      [20, 98]

edited May 26 '20 at 10:43

answered May 26 '20 at 10:12

jezrael

822,522
95
1,334
1,252

What do the ** mean ? – colla May 26 '20 at 10:17
@colla - It is for merge 2 dictionaries - `z = {**x, **y}`, [link](https://stackoverflow.com/questions/38987/how-do-i-merge-two-dictionaries-in-a-single-expression-in-python) – jezrael May 26 '20 at 10:18
Your answers says I have a syntax error in the following : [{'Id':47,'prices':['30','78'']},{'Id':94,'prices':['20']},{'Id':84,'prices':['20','98']}] Any ideas why it's doing so ? – colla May 26 '20 at 10:20
@colla - There is problem in data, it seems some data are messy. Is possible share input data, best json if possible? – jezrael May 26 '20 at 10:22
I edited my question so you can recreate my data easily – colla May 26 '20 at 10:24
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/214662/discussion-between-colla-and-jezrael). – colla May 26 '20 at 10:24
What if the data has date time in col 2? https://stackoverflow.com/questions/63729933/how-do-i-convert-an-array-of-values-from-data-frame-column-to-new-columns-using – Amogh Katwe Sep 03 '20 at 20:01

score 2 · Answer 2 · answered May 26 '20 at 12:06

Considering second parameter of "data" as list.

data= [
  ['A', [{'Id':42,'prices':['30','78']},{'Id': 44,'prices':['20','47','89']}]], 
  ['B', [{'Id':47,'prices':['30','78']}, {'Id':94,'prices':['20']},{'Id':84,'prices': 
        ['20','98']}]]
  ]

t_list = []

for i in range(len(data)):
    for j in range(len(data[i][1])):
        t_list.append((data[i][0], data[i][1][j]['Id'], data[i][1][j]['prices']))

df = pd.DataFrame(t_list, columns=['col1', 'id', 'price'])
print(df)

     col1  id         price
0    A     42      [30, 78]
1    A     44  [20, 47, 89]
2    B     47      [30, 78]
3    B     94          [20]
4    B     84      [20, 98]

Ch3steR · Answer 3 · 2020-05-26T19:03:21.570

You can use df.explode here with pd.Series.apply and df.set_index and df.reset_index

df.set_index('col1').explode('col2')['col2'].apply(pd.Series).reset_index()

  col1  Id        prices
0    A  42      [30, 78]
1    A  44  [20, 47, 89]
2    B  47      [30, 78]
3    B  94          [20]
4    B  84      [20, 98]

When col2 is string, use ast.literal_eval

import ast

data = [['A', "[{'Id':42,'prices':['30','78']},{'Id': 44,'prices':['20','47','89']}]"], 
        ['B', "[{'Id':47,'prices':['30','78']},{'Id':94,'prices':['20']},{'Id':84,'prices':['20','98']}]"]] 

df = pd.DataFrame(data, columns = ['col1', 'col2'])
df['col2'] = df['col2'].map(ast.literal_eval)

df.set_index('col1').explode('col2')['col2'].apply(pd.Series).reset_index()

  col1  Id        prices
0    A  42      [30, 78]
1    A  44  [20, 47, 89]
2    B  47      [30, 78]
3    B  94          [20]
4    B  84      [20, 98]

Pandas Dataframe split multiple key values to different columns

3 Answers3