I've got a hard problem with transforming a dataframe into another one.
I don't know what functions I should use to do what I want. I had some ideas that didn't work at all.
For example, I do not understand how I should use append (or if I should use it or something else) to do what I want.
Here is my original dataframe:
df1 = pd.DataFrame({
'Key': ['K0', 'K1', 'K2'],
'X0': ['a','b','a'],
'Y0': ['c','d','c'],
'X1': ['e','f','f'],
'Y1': ['g','h','h']
})
Key X0 Y0 X1 Y1
0 K0 a c e g
1 K1 b d f h
2 K2 a c f h
This dataframe represents every links attached to an ID in column Key. Links are following each other : X0->Y0 is the father of X1->Y1.
It's easy to read, and the real dataframe I'm working with has 6500 rows by 21 columns that represents a tree of links. So this dataframe has an end to end links logic.
I want to transform it into another one that has a unitary links and ID logic (because it's a tree of links, some unitary links may be part of multiple end to end links)
So I want to get each individual links into X->Y and I need to get the list of the Keys attached to each unitary links into Keys.
And here is what I want :
df3 = pd.DataFrame({
'Key':[['K0','K2'],'K1','K0',['K1','K2']],
'X':['a','b','e','f'],
'Y':['c','d','g','h']
})
Key X Y
0 [K0, K2] a c
1 K1 b d
2 K0 e g
3 [K1, K2] f h
To do this, I first have to combine X0 and X1 into a unique X column, idem for Y0 and Y1 into a unique Y column. At the same time I need to keep the keys attached to the links. This first transformation leads to a new dataframe, containing all the original information with duplicates which I will deal with after to obtain df3.
Here is the transition dataframe :
df2 = pd.DataFrame({
'Key':['K0','K1','K2','K0','K1','K2'],
'X':['a','b','a','e','f','f'],
'Y':['c','d','c','g','h','h']
})
Key X Y
0 K0 a c
1 K1 b d
2 K2 a c
3 K0 e g
4 K1 f h
5 K2 f h
Transition from df1 to df2
For now, I did this to put X0,X1 and Y0,Y1 into X and Y :
Key = pd.Series(dtype=str)
X = pd.Series(dtype=str)
Y = pd.Series(dtype=str)
for i in df1.columns:
if 'K' in i:
Key = Key.append(df1[i], ignore_index=True)
elif 'X' in i:
X = X.append(df1[i], ignore_index=True)
elif 'Y' in i:
Y = Y.append(df1[i], ignore_index=True)
0 K0
1 K1
2 K2
dtype: object
0 a
1 b
2 a
3 e
4 f
5 f
dtype: object
0 c
1 d
2 c
3 g
4 h
5 h
dtype: object
But I do not know how to get the keys to keep them in front of the right links.
Also, I do this to construct df2, but it's weird and I do not understand how I should do it :
df2 = pd.DataFrame({
'Key':Key,
'X':X,
'Y':Y
})
Key X Y
0 K0 a c
1 K1 b d
2 K2 a c
3 NaN e g
4 NaN f h
5 NaN f h
I tried to use append, to combine the X0,X1 and Y0,Y1 columns directly into df2, but it turns out to be a complete mess, not filling df2 columns with df1 columns content. I also tried to use append to put the Series Key, X and Y directly into df2, but it gave me X and Y in rows instead of columns.
In short, I'm quite lost with it. I know there may be a lot to program to take df1, turn in into df2 and then into df3. I'm not asking for you to solve the problem for me, but I really need help about the functions I should use or the logic I should put in place to achieve my goal.