0

So I was using the solution in this post (Split / Explode a column of dictionaries into separate columns with pandas) but nothing changes in my df.

Here is df before code:

    number  status_timestamps
0   234234  {"created": "2020-11-30T19:44:42Z", "complete"...
1   2342    {"created": "2020-12-14T13:43:48Z", "complete"...

Here is a sample of the dictionary in that column:

{"created": "2020-11-30T19:44:42Z", 
"complete": "2021-01-17T14:20:58Z",
 "invoiced": "2020-12-16T22:55:02Z", 
 "confirmed": "2020-11-30T21:16:48Z", 
 "in_production": "2020-12-11T18:59:26Z",
 "invoice_needed": "2020-12-11T22:00:09Z",
 "accepted": "2020-12-01T00:00:23Z", 
 "assets_uploaded": "2020-12-11T17:16:53Z", 
 "notified": "2020-11-30T21:17:48Z", 
 "processing": "2020-12-11T18:49:50Z",
 "classified": "2020-12-11T18:49:50Z"}

Here is what I tried and df does not change:

df_final = pd.concat([df, df['status_timestamps'].progress_apply(pd.Series)], axis = 1).drop('status_timestamps', axis = 1)

Here is what happens in a notebook: jupyter_result

SeanG
  • 15
  • 5

2 Answers2

0

Please provide a minimal reproducible working example of what you have tried next time.

If I follow the solution in the mentioned post, it works.

This is the code I have used:

import pandas as pd

json_data = {"created": "2020-11-30T19:44:42Z", 
"complete": "2021-01-17T14:20:58Z",
 "invoiced": "2020-12-16T22:55:02Z", 
 "confirmed": "2020-11-30T21:16:48Z", 
 "in_production": "2020-12-11T18:59:26Z",
 "invoice_needed": "2020-12-11T22:00:09Z",
 "accepted": "2020-12-01T00:00:23Z", 
 "assets_uploaded": "2020-12-11T17:16:53Z", 
 "notified": "2020-11-30T21:17:48Z", 
 "processing": "2020-12-11T18:49:50Z",
 "classified": "2020-12-11T18:49:50Z"}
 
df = pd.DataFrame({"number": 2342, "status_timestamps": [json_data]})

# fastest solution proposed by your reference post
df.join(pd.DataFrame(df.pop('status_timestamps').values.tolist()))
Stipe
  • 71
  • 4
  • Not sure what else I can provide @Stipe . The exmaple of what I tried in in the 3rd cell (`df_final = pd.concat([df, df['status_timestamps'].progress_apply(pd.Series)], axis = 1).drop('status_timestamps', axis = 1)` When running your example, it works how I would expect but sourcing the `df` from the csv still does not work for me in notebook `import pandas as pd df = pd.read_csv('c:/STATUS_TIMESTAMPS.csv') df2 = df.join(pd.DataFrame(df.pop('status_timestamps').values.tolist())) df2` – SeanG Feb 03 '22 at 16:34
  • Well, I can't simply copy, paste and run your code unlike my example code. That would be a good start. I don't know what goes wrong with your data, but it seems more a problem of correct execution of code by yourself. – Stipe Feb 04 '22 at 13:40
  • Apologies. Here is data: https://drive.google.com/file/d/1kN5by0W6ffZKCRQx3RoPnOFzodJ0-L_K/view?usp=sharing Here is code: `import pandas as pd df = pd.read_csv('status_timestamps.csv') df.join(pd.DataFrame(df.pop('status_timestamps').values.tolist())) ` – SeanG Feb 04 '22 at 15:54
0

I was able to use another answer from that post but change to a safer option of literal_eval since it was using eval

Here is working code:

import pandas as pd
from ast import literal_eval
df  = pd.read_csv('c:/status_timestamps.csv')
df["status_timestamps"] = df["status_timestamps"].apply(lambda x : dict(literal_eval(x)) )
df2 = df["status_timestamps"].apply(pd.Series )
df_final = pd.concat([df, df2], axis=1).drop('status_timestamps', axis=1)
df_final

enter image description here

SeanG
  • 15
  • 5