How to expand array type column in Pandas to individual columns

Question

I have a df, sample data looks like below:

df

user    collaborators
A       {'first_name' : 'jack', 'last_name' : 'abc', 'id' : 'a12@email.com'}, {'first_name' : 'john', 'last_name' : 'pqr', 'id' : 'd23@email.com'}
B       {'first_name' : 'tom', 'last_name' : 'pwc', 'id' : 'd99@email.com'}
C       {'first_name' : 'jill', 'last_name' : 'def', 'id' : 'x45@email.com'}, {'first_name' : 'jill', 'last_name' : 'def', 'id' : 'x45@email.com'}

I need to get the first_name, last_name and id in individual columns. Expected Output:

I first tried to remove the square brackets using:

df['collaborators'].str.strip('{}')
df['collaborators'].str.replace('[{}]', '')

But I got only NaN as the result, not sure why. I thought of exploding the column after converting it to a list and extracting first_name, last_name and id. But all this looks like lot of steps. Could someone please let me know if there's a shorter way.

Updated DF:

I used to_list to convert the three columns into list:

df['first_name'] = df['first_name'].to_list()
df['last_name'] = df['last_name'].to_list()
df['id'] = df['id'].to_list()

I then tried below code from SO:

 df.set_index('collaborators').apply(lambda x:
    x.apply(pd.Series).stack()).reset_index().drop('level_1', 1)

But it didn't work for me.

https://stackoverflow.com/questions/46391291/how-to-convert-json-data-inside-a-pandas-column-into-new-columns?msclkid=a6deeed1cc4911ecbfbf8a32d7c734e2 — Kirk, May 05 '22 at 08:04
What exactly is your data? Can you load it into pandas, apply `.values` on it and paste the output here? — Zero, May 05 '22 at 08:13

score 1 · Accepted Answer · answered May 05 '22 at 08:03

1

Use ast.literal_eval with add [] for lists of DataFrames, so possible use DataFrame.explode with json_normalize and last append to original with remove helper column new in DataFrame.pop:

import ast

df['new'] = df['collaborators'].apply(lambda x: ast.literal_eval('[' + x + ']'))

df1 = df.explode('new').reset_index(drop=True)

df = df1.join(pd.json_normalize(df1.pop('new')))
print (df)

  user                                      collaborators first_name  \
0    A  {'first_name' : 'jack', 'last_name' : 'abc', '...       jack   
1    A  {'first_name' : 'jack', 'last_name' : 'abc', '...       john   
2    B  {'first_name' : 'tom', 'last_name' : 'pwc', 'i...        tom   
3    C  {'first_name' : 'jill', 'last_name' : 'def', '...       jill   
4    C  {'first_name' : 'jill', 'last_name' : 'def', '...       jill   

  last_name             id  
0       abc  a12@email.com  
1       pqr  d23@email.com  
2       pwc  d99@email.com  
3       def  x45@email.com  
4       def  x45@email.com

answered May 05 '22 at 08:03

jezrael

822,522
95
1,334
1,252

When I run on my data, I am getting this error: could you please let me know how to fix this: `ValueError: malformed node or string: ` – Karthik S May 05 '22 at 08:43
@KarthikS - what is `print (list(df.loc[0, 'collaborators']))` ? – jezrael May 05 '22 at 08:45
I am getting the the array within a list. Apologies can't paste the actual output as it's company data. But for example it's looks like this: `[{'first_name' : 'jack', 'last_name' : 'abc', 'id' : 'a12@email.com'}, {'first_name' : 'john', 'last_name' : 'pqr', 'id' : 'd23@email.com'}]` – Karthik S May 05 '22 at 08:47
looking at the error, I am not able to figure out the issue. Could you please let me know if there's another way – Karthik S May 05 '22 at 08:57
@KarthikS - Change `df['new'] = df['collaborators'].apply(lambda x: ast.literal_eval('[' + x + ']'))` to `df['new'] = df['collaborators']` ;) – jezrael May 05 '22 at 08:58
Tried, but it's returning the column as is. Not getting the new columns – Karthik S May 05 '22 at 09:10
@KarthikS can you try `df['new'] = df['collaborators'].apply(ast.literal_eval)` ? – jezrael May 05 '22 at 09:14
Same error :`ValueError: malformed node or string: ` – Karthik S May 05 '22 at 09:16
@KarthikS - is possible use [this](https://stackoverflow.com/questions/57975173/how-to-extract-values-from-json-like-text/57976155#57976155) ?"Like `df['new'] = df['collaborators'].apply(literal_eval_cust)` ? If not possble parse generate `{}` – jezrael May 05 '22 at 09:20
Yes it worked, but not seeing the new columns, do I need to remove the `df1.pop('new'))` part? – Karthik S May 05 '22 at 09:28
@KarthikS - unfortuntaely it means all values are wrong, so generate `{}` It can test `print (df['new'])` after `df['new'] = df['collaborators'].apply(literal_eval_cust)` – jezrael May 05 '22 at 09:29
@KarthikS - So is possible test data which failed, e.g. first 2 rows by `print (df['collaborators'].head(2).tolist())` ? – jezrael May 05 '22 at 09:36
Let me extract the data from the DB differently and see. – Karthik S May 05 '22 at 09:46
Have updated my question, can you please help me how to get the desired result now. That ie to explode the first_name, last_name and id columns so as to have a single value in each row. – Karthik S May 05 '22 at 10:09
@KarthikS - so what do code `df['first_name'] = df['first_name'].to_list()` ? Not working same if omit `df['first_name'] = df['first_name'].to_list()` ? Maybe you want share `print (df['collaborators'].head(2).tolist())` with real data for test. – jezrael May 05 '22 at 10:11
Looks like my `to_list` hasn't worked the dtypes of all columns is still `object` and not `list` – Karthik S May 05 '22 at 12:08
1

No worries, have figured out, thanks a bunch for your patience and help – Karthik S May 05 '22 at 12:38

keramat · Answer 2 · 2022-05-05T16:35:41.103

0

Use:

(df[0]+",").apply(eval).explode().apply(pd.Series)

Output:

Complete code:

data="""{'first_name' : 'jack', 'last_name' : 'abc', 'id' : 'a12@email.com'},  {'first_name' : 'john', 'last_name' : 'pqr', 'id' : 'd23@email.com'}
{'first_name' : 'tom', 'last_name' : 'pwc', 'id' : 'd99@email.com'}
{'first_name' : 'jill', 'last_name' : 'def', 'id' : 'x45@email.com'},  {'first_name' : 'jill', 'last_name' : 'def', 'id' : 'x45@email.com'}"""
df = pd.DataFrame([x for x in data.split('\n')])#
(df[0]+",").apply(eval).explode().apply(pd.Series)

edited May 05 '22 at 16:35

answered May 05 '22 at 08:13

keramat

4,328
6
25
38

1

https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice – jezrael May 05 '22 at 08:19

How to expand array type column in Pandas to individual columns

2 Answers2