I have 1 table A having data in the following format
Desc Name
Demo1 abc.def.txt.xls
Demo2 pqr.wef.tuv.wsf
Demo3 yrd.itr.poe.kjq.sce
I have another table B having data in the following format
Key Id
abc.def.txt 1000
pqr.wef 1001
yrd 1002
I have table C having data in the following format
Hie1Nam Hie1ID Hie2Name Hie2ID Hie3Name Hie3ID Hie4Name
Hie4Id Hie5Name Hie5Id
abc 1 abc.def 11 abc.def.txt 111
abc.def.txt.xls 1111
pqr 2 pqr.wef 12 pqr.wef.tuv 112
pqr.wef.tuv.wsf 1112
yrd 3 yrd.itr 13 yrd.itr.poe 113 yrd.itr.poe.kjq
1113 yrd.itr.poe.kjq.sce 11113
Desired output is as below
Desc Name Id
Demo1 abc.def.txt.xls 1000
Demo2 pqr.wef.tuv.wsf 1001
Demo3 yrd.itr.poe.kjq.sce 1002
So, the Name needs to be splited by removing the last dot and following sequence of characters to get its parent.Either it's parent or any of its grandparents will have an entry in table B.I need to get the corresponding id.
These 3 tables are stored as dataframes in databricks.
How can I achieve this in an easy way using spark sql or scala?