I have a dataframe with four columns: parent_serialno, child_serialno, parent_function, and child_function. I would like to construct a dataframe where each row is a root parent and each column is a function with the values being the serial number for that function.
For example, the dataframe looks like this:
df = pd.DataFrame(
[['001', '010', 'A', 'B'], ['001', '020', 'A', 'C'], ['010', '100', 'B', 'D'], ['100', '110', 'D', 'E'],
['002', '030', 'A', 'B'], ['002', '040', 'A', 'C']],
columns=['parent_serialno', 'child_serialno', 'parent_function', 'child_function'])
Note that not all functions contain a descendant for every root, but there is only one serial number for each function for a given root. The root serial numbers are known ahead of time.
What I would like to output looks like a dataframe like:
pd.DataFrame([['001','010','020','100','110'],['002','030','040', np.nan, np.nan]], columns = ['A','B','C','D','E'])
Out[1]:
A B C D E
0 001 010 020 100 110
1 002 030 040 NaN NaN
This post shows how to get a dictionary hierarchy, but I'm less concerned about identifying the location of a leaf in the tree (i.e. grandchild vs great-grandchild) and more concerned with just identifying the root and function of each leaf.