I have the following dataframe:
id_parent id_child
0 1100 1090
1 1100 1080
2 1100 1070
3 1100 1060
4 1090 1080
5 1090 1070
6 1080 1070
and I only want to keep the direct parent child connects. Example: 1100 has 3 connections, but only 1090 shall be kept because 1080 and 1070 are already childs of 1090. This example df only contains 1 sample, the df consists of multiple parent/child clusters.
Therefore the output should look like this:
id_parent id_child
0 1100 1090
1 1090 1080
2 1080 1070
3 1100 1060
sample code:
import pandas as pd
#create sample input
df_input = pd.DataFrame.from_dict({'id_parent': {0: 1100, 1: 1100, 2: 1100, 3: 1100, 4: 1090, 5: 1090, 6: 1080}, 'id_child': {0: 1090, 1: 1080, 2: 1070, 3: 1060, 4: 1080, 5: 1070, 6: 1070}})
#create sample output
df_output = pd.DataFrame.from_dict({'id_parent': {0: 1100, 1: 1090, 2: 1080, 3: 1100}, 'id_child': {0: 1090, 1: 1080, 2: 1070, 3: 1060}})
My current approach would be based on this question: Creating dictionary of parent child pairs in pandas dataframe But maybe there is a simple clean way to solve this without relaying on additional non standard libraries?