I am doing a set of merging on two dataframes. One data frame contains a two columns which shows which a parent child connection. Like this
RID | Child_RID |
---|---|
xxxxx | yyyyy |
yyyyy | zzzzz |
Where a child will appear in the RID section to as they may also have children
I have a second dataframe that has more information on an object like this
name | RID | Child_1 |
---|---|---|
site | xxxx | yyyy |
site | xxxx | oooo |
somthing | iiii | llll |
I wrote a merging function to reuse as i will keep doing it for a number of times until there are no children left.
And for the first merge it works fine and produces what I want. But on the second and third merge it drops information. I would expect for it to have nan if there is missing data which is what I want. but it erases pretty much parts or all the of previous merges.
def merge_dataset(df1, list_df, index_1, index_2):
num_df = pd.DataFrame(list_df, columns = [index_1,index_2])
df2 = pd.merge(df1, num_df, on=index_1)
return df2
And this is how I use this method
mg1 = merge_dataset(df, parent_child_longlist, 'child_1','child_2')
# Change column order
cols = ['name', 'rid', 'child_1','child_2']
mg1 = mg1[cols]
And this works fine producing this
The second merge seemed to drop information from the dataframe which I dont know why. I would expect for it to have more rows than the first merge
I called the second merge like the first one
mg2 = merge_dataset(mg1, parent_child_longlist, 'child_2','child_3')
cols = ['name', 'rid', 'child_1','child_2', 'child_3']
mg2 = mg2[cols]
The issue is magnified with the third merge. I call it the same way
mg3 = merge_dataset(mg2, parent_child_longlist, 'child_3','child_4')
cols = ['name', 'rid', 'child_1','child_2', 'child_3', 'child_4']
mg3 = mg3[cols]
But then drops 90% of the data like this
So I am a bit lost as to why it would drop all the data after this third merge when the first two seems to work just fine. And i will need to do this 10 or 20 more times until all the parent child connections are mapped in this dataframe.