Python 3, Pandas csv row that has key/values nested in some of the fields. How to flatten this into one wide row

Question

I'm trying to manipulate some CSV data, and normally I would use pandas when I have complex changes. However I have no idea how to deal with nested key values inside one or more CSV fields.

So in essence I have data like this,

+------+------+-------------------+------+------+
| col1 | col2 | col3              | col4 | col5 |
+------+------+-------------------+------+------+
| v    | v    | ncol1=nv,ncol2=nv | v    | v    |
+------+------+-------------------+------+------+
| v    | v    | ncol3=nv          | v    | v    |
+------+------+-------------------+------+------+
| v    | v    |                   | v    | v    |
+------+------+-------------------+------+------+

And I'm trying to get something like,

+------+------+-------+-------+-------+------+------+
| col1 | col2 | ncol1 | ncol2 | ncol3 | col4 | col5 |
+------+------+-------+-------+-------+------+------+
| v    | v    | nv    | nv    |       | v    | v    |
+------+------+-------+-------+-------+------+------+
| v    | v    |       |       | nv    | v    | v    |
+------+------+-------+-------+-------+------+------+
| v    | v    |       |       |       | v    | v    |
+------+------+-------+-------+-------+------+------+

check out `explode`: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.explode.html — jsmart, Aug 06 '20 at 18:00
Yea, I see the low level ways to get the data into a usable data frame, but I don't know how to actually read the data in. This looks similar to json_normalize, but with a csv. — Cyberpunkspike, Aug 06 '20 at 18:05
Can you post the CSV file? This link might be useful: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — jsmart, Aug 06 '20 at 18:08
@jsmart, this example pretty much shows exactly what you are suggesting, and what I was trying to do. https://binal.pub/2018/09/dealing-with-nested-data-in-pandas/ — Cyberpunkspike, Aug 06 '20 at 18:13
@jsmart, I don't have two nicely separated data structures to start with though, I have the csv, which is exactly as shown in my example. How do I get that nested field into a structure, from which I could use explode. — Cyberpunkspike, Aug 06 '20 at 18:14
@jsmart, BTW, I want to get concated columns, so explode is actually useless for me, it appears. — Cyberpunkspike, Aug 06 '20 at 18:30

Anant Kumar · Answer 1 · 2020-08-06T19:36:36.110

Assuming that the DataFrame Values in Column C is a comma separated string, the code does the following

Creates a dictionary from the comma separated string
Removes all null valued rows/ empty rows present in column C so that the dictionary object previously created can be expanded
Dynamically creates new columns based on the dictionary keys
Expands the Dictionary
Merges the null valued Dataframe and newly created Dataframe

import pandas as pd
import numpy as np
df=pd.DataFrame({"A":['a','b','c',],"B":['e','f','d'],"C":['D=nv,E=nv',np.nan,"D=nv"],})
#Converts string to dictionary of key-value pairs
df.loc[:,"C"]=df.loc[:,"C"].apply(lambda x: dict(map(lambda z: z.split('='),x.split(","))) if type(x)==str else np.nan)
#Drop all null values present in Column so that the dataframe can be expanded
#Separate the null and actual rows containing values into 2 separate dataframes
df_act=df.dropna(subset=["C"])
df_null=df[~df.index.isin(df_act.index)]
#Expand the Column and store in a temporary DataFrame
df_temp=df_act['C'].apply(pd.Series)
for cols in df_temp.columns:
    df_act.loc[:,cols]=np.nan
    df_null.loc[:,cols]=np.nan

#Save Contents in the actual DataFrame
df_act[df_temp.columns]=df_temp
#Drop C Column to match with Sample Output
df_act.drop("C", axis=1, inplace=True)
df_null.drop("C", axis=1, inplace=True)
#Concatenate the DataFrames
final_df=pd.concat([df_act, df_null])

Please note that the removal of C column is only done so that output matches with the sample output provided.

Python 3, Pandas csv row that has key/values nested in some of the fields. How to flatten this into one wide row

1 Answers1