2

I'm reading a .parquet file that have below string column:

{"circuitStatus": "CREATED", "startedAt": "2019-02-11T16:07:31.121Z",
"event": "CIRCUIT_CREATION"}, 
{"circuitStatus": "RUNNING", "startedAt": "2019-02-11T16:07:32.147Z", 
"diff": [], "event": "CIRCUIT_UPDATED"}]}

I want to unnest this column but it's failing because it's a string.

this is the original dataframe:

enter image description here

and this is how I need it: enter image description here

I manually did the unnest in my Jupyter Notebook with:

df =pd.concat([df.drop(['B'], axis=1), df['B'].apply(pd.Series)], axis=1)

but only if the column is not a string:

df = pd.DataFrame({'A':'7e1ab727-a9e9-4c00-b6dc-9e65e91b9e4f','B':[{"circuitStatus": "CREATED", "startedAt": "2019-02-11T16:07:31.121Z", "event": "CIRCUIT_CREATION"}, {"circuitStatus": "RUNNING", "startedAt": "2019-02-11T16:07:32.147Z", "diff": [], "event": "CIRCUIT_UPDATED"}]})
df2 = pd.DataFrame({'A':'22222222-a9e9-4c00-b6dc-9e65e91b9e4f','B':[{"circuitStatus": "CREATED",` "startedAt": "2019-02-11T16:07:31.121Z", "event": "CIRCUIT_CREATION"}, {"circuitStatus": "RUNNING", "startedAt": "2019-02-11T16:07:32.147Z", "diff": [], "event": "CIRCUIT_UPDATED"}]})
df3 = pd.concat([df, df2])
df3 =pd.concat([df3.drop(['B'], axis=1), df3['B'].apply(pd.Series)], axis=1)
df3

when I try the same code reading from the .parquet it doesn't throw an error but the unnest it's not done

Mohammad
  • 1,549
  • 1
  • 15
  • 27
csegovia
  • 93
  • 1
  • 10
  • 1
    You can use ```json.loads()``` (https://www.programiz.com/python-programming/json), but I would rather recommend just don't do ```parquet``` in ```pandas```. ```pyspark``` would spare you this kind of issues (https://stackoverflow.com/a/47207794/11610186) – Grzegorz Skibinski Dec 02 '19 at 21:28

0 Answers0