2

I have a df, sample data looks like below:

df

user    collaborators
A       {'first_name' : 'jack', 'last_name' : 'abc', 'id' : 'a12@email.com'}, {'first_name' : 'john', 'last_name' : 'pqr', 'id' : 'd23@email.com'}
B       {'first_name' : 'tom', 'last_name' : 'pwc', 'id' : 'd99@email.com'}
C       {'first_name' : 'jill', 'last_name' : 'def', 'id' : 'x45@email.com'}, {'first_name' : 'jill', 'last_name' : 'def', 'id' : 'x45@email.com'}

I need to get the first_name, last_name and id in individual columns. Expected Output:

enter image description here

I first tried to remove the square brackets using:

df['collaborators'].str.strip('{}')
df['collaborators'].str.replace('[{}]', '')

But I got only NaN as the result, not sure why. I thought of exploding the column after converting it to a list and extracting first_name, last_name and id. But all this looks like lot of steps. Could someone please let me know if there's a shorter way.

Updated DF:

enter image description here

I used to_list to convert the three columns into list:

df['first_name'] = df['first_name'].to_list()
df['last_name'] = df['last_name'].to_list()
df['id'] = df['id'].to_list()

I then tried below code from SO:

 df.set_index('collaborators').apply(lambda x:
    x.apply(pd.Series).stack()).reset_index().drop('level_1', 1)

But it didn't work for me.

Karthik S
  • 11,348
  • 2
  • 11
  • 25
  • have a look at pd.json_normalize() or df.explode() – Bendik Knapstad May 05 '22 at 07:55
  • https://stackoverflow.com/questions/46391291/how-to-convert-json-data-inside-a-pandas-column-into-new-columns?msclkid=a6deeed1cc4911ecbfbf8a32d7c734e2 – Kirk May 05 '22 at 08:04
  • What exactly is your data? Can you load it into pandas, apply `.values` on it and paste the output here? – Zero May 05 '22 at 08:13

2 Answers2

1

Use ast.literal_eval with add [] for lists of DataFrames, so possible use DataFrame.explode with json_normalize and last append to original with remove helper column new in DataFrame.pop:

import ast

df['new'] = df['collaborators'].apply(lambda x: ast.literal_eval('[' + x + ']'))

df1 = df.explode('new').reset_index(drop=True)

df = df1.join(pd.json_normalize(df1.pop('new')))
print (df)

  user                                      collaborators first_name  \
0    A  {'first_name' : 'jack', 'last_name' : 'abc', '...       jack   
1    A  {'first_name' : 'jack', 'last_name' : 'abc', '...       john   
2    B  {'first_name' : 'tom', 'last_name' : 'pwc', 'i...        tom   
3    C  {'first_name' : 'jill', 'last_name' : 'def', '...       jill   
4    C  {'first_name' : 'jill', 'last_name' : 'def', '...       jill   

  last_name             id  
0       abc  a12@email.com  
1       pqr  d23@email.com  
2       pwc  d99@email.com  
3       def  x45@email.com  
4       def  x45@email.com  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • When I run on my data, I am getting this error: could you please let me know how to fix this: `ValueError: malformed node or string: ` – Karthik S May 05 '22 at 08:43
  • @KarthikS - what is `print (list(df.loc[0, 'collaborators']))` ? – jezrael May 05 '22 at 08:45
  • I am getting the the array within a list. Apologies can't paste the actual output as it's company data. But for example it's looks like this: `[{'first_name' : 'jack', 'last_name' : 'abc', 'id' : 'a12@email.com'}, {'first_name' : 'john', 'last_name' : 'pqr', 'id' : 'd23@email.com'}]` – Karthik S May 05 '22 at 08:47
  • looking at the error, I am not able to figure out the issue. Could you please let me know if there's another way – Karthik S May 05 '22 at 08:57
  • @KarthikS - Change `df['new'] = df['collaborators'].apply(lambda x: ast.literal_eval('[' + x + ']'))` to `df['new'] = df['collaborators']` ;) – jezrael May 05 '22 at 08:58
  • Tried, but it's returning the column as is. Not getting the new columns – Karthik S May 05 '22 at 09:10
  • @KarthikS can you try `df['new'] = df['collaborators'].apply(ast.literal_eval)` ? – jezrael May 05 '22 at 09:14
  • Same error :`ValueError: malformed node or string: ` – Karthik S May 05 '22 at 09:16
  • @KarthikS - is possible use [this](https://stackoverflow.com/questions/57975173/how-to-extract-values-from-json-like-text/57976155#57976155) ?"Like `df['new'] = df['collaborators'].apply(literal_eval_cust)` ? If not possble parse generate `{}` – jezrael May 05 '22 at 09:20
  • Yes it worked, but not seeing the new columns, do I need to remove the `df1.pop('new'))` part? – Karthik S May 05 '22 at 09:28
  • @KarthikS - unfortuntaely it means all values are wrong, so generate `{}` It can test `print (df['new'])` after `df['new'] = df['collaborators'].apply(literal_eval_cust)` – jezrael May 05 '22 at 09:29
  • @KarthikS - So is possible test data which failed, e.g. first 2 rows by `print (df['collaborators'].head(2).tolist())` ? – jezrael May 05 '22 at 09:36
  • Let me extract the data from the DB differently and see. – Karthik S May 05 '22 at 09:46
  • Have updated my question, can you please help me how to get the desired result now. That ie to explode the first_name, last_name and id columns so as to have a single value in each row. – Karthik S May 05 '22 at 10:09
  • @KarthikS - so what do code `df['first_name'] = df['first_name'].to_list()` ? Not working same if omit `df['first_name'] = df['first_name'].to_list()` ? Maybe you want share `print (df['collaborators'].head(2).tolist())` with real data for test. – jezrael May 05 '22 at 10:11
  • Looks like my `to_list` hasn't worked the dtypes of all columns is still `object` and not `list` – Karthik S May 05 '22 at 12:08
  • 1
    No worries, have figured out, thanks a bunch for your patience and help – Karthik S May 05 '22 at 12:38
0

Use:

(df[0]+",").apply(eval).explode().apply(pd.Series)

Output:

enter image description here

Complete code:

data="""{'first_name' : 'jack', 'last_name' : 'abc', 'id' : 'a12@email.com'},  {'first_name' : 'john', 'last_name' : 'pqr', 'id' : 'd23@email.com'}
{'first_name' : 'tom', 'last_name' : 'pwc', 'id' : 'd99@email.com'}
{'first_name' : 'jill', 'last_name' : 'def', 'id' : 'x45@email.com'},  {'first_name' : 'jill', 'last_name' : 'def', 'id' : 'x45@email.com'}"""
df = pd.DataFrame([x for x in data.split('\n')])#
(df[0]+",").apply(eval).explode().apply(pd.Series)
keramat
  • 4,328
  • 6
  • 25
  • 38