I have a pandas dataframe which contains a list of big family members.
import pandas as pd
data = {'child':['Joe','Anna','Anna','Steffani','Bob','Rea','Dani','Dani','Selma','John','Kevin'],
'parents':['Steffani','Bob','Steffani','Dani','Selma','Anna','Selma','John','Kevin','-','Robert'],
}
df = pd.DataFrame(data)
From this dataframe, I need to build a new table by adding multiple columns to the right which show the relationships between data. The value in the right column shows the elder relation. Each column represents the relationship. If I can draw the diagram, it may look like this:
child --> parents --> grandparents --> parents of grandparents --> grandparents of grandparents --> etc.
So, the expected output of the dataframe will be like this:
child parents A B C D (etc)
---------------------------------------------------------------------------------
0 Joe Steffani Dani Selma Kevin <If still possible>
1 Joe Steffani Dani John -
2 Anna Bob Selma Kevin Robert
3 Anna Steffani Dani Selma Kevin
4 Anna Steffani Dani John -
5 Steffani Dani Selma Kevin Robert
6 Steffani Dani John - -
7 Bob Selma Kevin Robert -
8 Rea Anna Bob Selma Kevin
9 Rea Anna Steffani Dani Selma
10 Rea Anna Steffani Dani John
11 Dani Selma Kevin Robert -
12 Dani John - - -
13 Selma Kevin Robert - -
14 John - - - -
15 Kevin Robert - - -
Currently, I build the new table manually using pandas.merge
. But I need to do it many times until the last column which has no elder relationship with the left column.
For example:
Step 1
df2 = pd.merge(df, df, left_on='parents', right_on='child', how='left').fillna('-')
df2 = df2[['child_x','parents_x','parents_y']]
df2.columns = ['child','parents','A']
Step 2
df3 = pd.merge(df2, df, left_on='A', right_on='child', how='left').fillna('-')
df3 = df3[['child_x','parents_x','A','parents_y']]
df3.columns = ['child','parents','A','B']
Step 3
df4 = pd.merge(df3, df, left_on='B', right_on='child', how='left').fillna('-')
df4 = df4[['child_x','parents_x','A','B','parents_y']]
df4.columns = [['child','parents','A','B','C']]
Step 4
Write the similar code to add the 6th column for column D if the value in Column C still has elder relation.
Problem:
Since I have big data in the dataframe (more than 10K of datapoints), how to solve it without writing code step by step? I don't know how many steps I need to build the final table.
Thanks in advance for any help.