Given the following DataFrame:
Category Area Country Code Function Last Name LanID Spend1 Spend2 Spend3 Spend4 Spend5
0 Bisc EE RU02,UA02 Mk Smith df3432 1.0 NaN NaN NaN NaN
1 Bisc EE RU02 Mk Bibs fdss34 1.0 NaN NaN NaN NaN
2 Bisc EE UA02,EURASIA Mk Crow fdsdr43 1.0 NaN NaN NaN NaN
3 Bisc WE FR31 Mk Ellis fdssdf3 1.0 NaN NaN NaN NaN
4 Bisc WE BE32,NL31 Mk Mower TOZ1720 1.0 NaN NaN NaN NaN
5 Bisc WE FR31,BE32,NL31 LKU Elan SKY8851 1.0 1.0 1.0 1.0 1.0
6 Bisc SE IT31 Mk Bobret 3dfsfg 1.0 NaN NaN NaN NaN
7 Bisc SE GR31 Mk Concept MOSGX009 1.0 NaN NaN NaN NaN
8 Bisc SE RU02,IT31,GR31,PT31,ES31 LKU Solar MSS5723 1.0 1.0 1.0 1.0 1.0
9 Bisc SE IT31,GR31,PT31,ES31 Mk Brix fdgd22 NaN 1.0 NaN NaN NaN
10 Choc CE RU02,CZ31,SK31,PL31,LT31 Fin Ocoser 43233d NaN 1.0 NaN NaN NaN
11 Choc CE DE31,AT31,HU31,CH31 Fin Smuth 4rewf NaN 1.0 NaN NaN NaN
12 Choc CE BG31,RO31,EMA Fin Momocs hgghg2 NaN 1.0 NaN NaN NaN
13 Choc WE FR31,BE32,NL31 Fin Bruntly ffdd32 NaN NaN NaN NaN 1.0
14 Choc WE FR31,BE32,NL31 Mk Ofer BROGX011 NaN 1.0 1.0 NaN NaN
15 Choc WE FR31,BE32,NL31 Mk Hem NZJ3189 NaN NaN NaN 1.0 1.0
16 G&C NE UA02,SE31 Mk Cre ORY9499 1.0 NaN NaN NaN NaN
17 G&C NE NO31 Mk Qlyo XVM7639 1.0 NaN NaN NaN NaN
18 G&C NE GB31,NO31,SE31,IE31,FI31 Mk Omny LOX1512 NaN 1.0 1.0 NaN NaN
I would like to get it exported into a nested Dict with the below structure:
{RU02: {Bisc: {EE: {Mkt: {Spend1: {df3432: Smith}
{fdss34: Bibs}
{Bisc: {SE: {LKU: {Spend1: {MSS5723: Solar}
{Spend2: {MSS5723: Solar}
{Spend3: {MSS5723: Solar}
{Spend4: {MSS5723: Solar}
{Spend5: {MSS5723: Solar}
{Choc: {CE: {Fin: {Spend2: {43233d: Ocoser}
.....
{UA02: {Bisc: {EE: {Mkt: {Spend1: {df3432: Smith}
{ffdsdr43: Crow}
{G&C: {NE: {Mkt: {Spend1: {ORY9499: Cre}
.....
So essentially, in this Dict I'm trying to track for each CountryCode, what is the list of LastNames+LandIDs, per Spend category (Spend1,Spend2, etc.) and their attributes (Function, Category, Area).
The DataFrame is not very large (less than 200rows), but it contains almost all types of combinations between Category/Area/Country Code as well as LastNames and their Spend categories (many-to-many).
My challenge is that i'm unable to figure out how to clearly conceptualise the steps i need to do in order to prepare the DataFrame properly for export to Dict....
What i figured so far is that i would need:
- a way to slice the contents of the "Country Code" column based on the "," separator: DONE
- create new columns based on unique Country Codes, and have 1 in each row where that column code is preset: DONE
- set the index of the DataFrame recursively to each of the newly added columns
- move into a new DataFrame each rows for each Country Code where there is data
- export all the new DataFrames to Dicts, and then merge them
Not sure if steps 3-6 is the best way to go about this though, as i'm having difficulties still to understand how pd.DataFrame.to_dict should be configured for my case (if that's even possible)...
Highly appreciate your help on the coding side, but also in briefly explaining your thought process for each stage.
Here is how far i got on my own..
#keeping track of initial order of columns
initialOrder = list(df.columns.values)
# split the Country Code by ","
CCodeNoCommas= [item for items in df['Country Code'].values for item in items.split(",")]
# add only the UNIQUE Country Codes -via set- as new columns in the DataFrame,
#with NaN for row values
df = pd.concat([df,pd.DataFrame(columns=list(set(CCodeNoCommas)))])
# reordering columns to have the newly added ones at the end
reordered = initialOrder + [c for c in df.columns if c not in initialOrder]
df = df[reordered]
# replace NaN with 1 in the newly added columns (Country Codes), where the same Country code
# exists in the initial column "Country Code"; do this for each row
CCodeUniqueOnly = set(CCodeNoCommas)
for c in CCodeUniqueOnly:
CCodeIsPresent_rowIndex = df.index[df['Country Code'].str.contains(c)]
#print (CCodeIsPresent_rowIndex)
df.loc[CCodeIsPresent_rowIndex, c] = 1
# no clue what do do next ??