I have a Dataframe with two columns (Parent and Child), which have unknown levels of granularity / hierarchy, as shown below:
Parent | Child |
---|---|
A | B |
B | C |
C | D |
B | BB |
X | Y |
Y | Z |
And the result should be something like, using pyspark dataframe functions or sparkSQL:
level1 | level2 | level3 | level4 | granularity_level |
---|---|---|---|---|
A | B | C | D | 4 |
A | B | BB | Null | 3 |
X | Y | Z | Null | 3 |
Thanks & Regards