Split and join dataframe on the basis of delimited character

Question

I have 1 table A having data in the following format

    Desc    Name
    Demo1   abc.def.txt.xls
    Demo2   pqr.wef.tuv.wsf
    Demo3   yrd.itr.poe.kjq.sce

I have another table B having data in the following format

    Key     Id
    abc.def.txt    1000
    pqr.wef    1001
    yrd    1002

I have table C having data in the following format

    Hie1Nam    Hie1ID    Hie2Name    Hie2ID    Hie3Name    Hie3ID    Hie4Name 
    Hie4Id    Hie5Name    Hie5Id
    abc             1   abc.def     11      abc.def.txt     111     
    abc.def.txt.xls     1111    
    pqr             2   pqr.wef     12      pqr.wef.tuv     112     
    pqr.wef.tuv.wsf     1112
    yrd    3    yrd.itr    13    yrd.itr.poe    113    yrd.itr.poe.kjq
    1113    yrd.itr.poe.kjq.sce    11113

Desired output is as below

    Desc    Name                Id
    Demo1   abc.def.txt.xls     1000
    Demo2   pqr.wef.tuv.wsf     1001
    Demo3   yrd.itr.poe.kjq.sce 1002

So, the Name needs to be splited by removing the last dot and following sequence of characters to get its parent.Either it's parent or any of its grandparents will have an entry in table B.I need to get the corresponding id.

These 3 tables are stored as dataframes in databricks.

How can I achieve this in an easy way using spark sql or scala?

please dont post images you can post the code or at lest data as table format. now answerers has to do lot of hard work to convert them in to data frames for example data. — user3190018, Mar 20 '20 at 03:41

Split and join dataframe on the basis of delimited character

0 Answers0