I have a long dataframe I need to transform to get a wide one. The long one is :
df = pd.DataFrame({
'key' : ['E', 'E', 'E', 'E', 'J', 'J', 'J', 'J'],
'father' : ['A', 'D', 'C', 'B', 'F', 'H', 'G', 'I'],
'son' : ['B', 'E', 'D', 'C', 'G', 'I', 'H', 'J']
})
df
First thing to do, I think, is to group it by key. Then we have to find where those keys are found into the column 'son', it's the end (and last son) of the link I need to rebuild.
To rebuild the link, I need to look for his 'father'. His 'father' needs to be kept as father of final step and, also needs to be found into 'son'.
I need to iterate this until a 'father' cannot be found into the 'son' column, so it's going to be the father_0 of the link.
I think it could be done iterating those steps into a recursive function where the stop case : is 'father' not found in 'son'.
Here is the dataframe I want to get from this :
df1 = pd.DataFrame({
'key' : ['E', 'J'],
'father_1' : ['A', 'F'],
'son_1' : ['B', 'G'],
'father_2' : ['B', 'G'],
'son_2' : ['C', 'H'],
'father_3' : ['C', 'H'],
'son_3' : ['D', 'I'],
'father_4' : ['D', 'I'],
'son_4' : ['E', 'J'],
})
df1
I simplified the problem here with 2 different links of the same depth, but they could be from depth 1 to depth 10 (maybe more but rarely and unpredictably) for a lot of different keys. Here is another example of df with 2 links of different size :
df_ = pd.DataFrame({
'key' : ['E', 'E', 'E', 'E', 'K', 'K', 'K', 'K', 'K'],
'father' : ['A', 'D', 'C', 'B', 'F', 'H', 'G', 'I', 'J'],
'son' : ['B', 'E', 'D', 'C', 'G', 'I', 'H', 'J', 'K']
})
df_
df_1 = pd.DataFrame({
'key' : ['E', 'K'],
'father_1' : ['A', 'F'],
'son_1' : ['B', 'G'],
'father_2' : ['B', 'G'],
'son_2' : ['C', 'H'],
'father_3' : ['C', 'H'],
'son_3' : ['D', 'I'],
'father_4' : ['D', 'I'],
'son_4' : ['E', 'J'],
'father_5' : [np.NaN, 'J'],
'son_5' : [np.NaN, 'K']
})
df_1
Then the final step is easy, it's about taking 'father_x' and 'son_x-1' into 'step_x-1' : So the resulting dataframes for these examples would be :
df2 = pd.DataFrame({
'key' : ['E', 'J'],
'step_0' : ['A', 'F'],
'step_1' : ['B', 'G'],
'step_2' : ['C', 'H'],
'step_3' : ['D', 'I'],
'step_4' : ['E', 'J'],
})
df2
df_2 = pd.DataFrame({
'key' : ['E', 'K'],
'step_0' : ['A', 'F'],
'step_1' : ['B', 'G'],
'step_2' : ['C', 'H'],
'step_3' : ['D', 'I'],
'step_4' : ['E', 'J'],
'step_5' : [np.NaN, 'K']
})
df_2
My concerne is more about the way to aggregate the data from wide to long following the previously given rules into an recursive function.
It's like in a groupby.agg
but that I can't just pass a dictionnary into it because the new columns are based on the number of iteration of the recursive function on each key.