0

I am trying to create a parent child relationship between two dataframes:

df_5:

    parent  Tag_Name
0   NaN             Mkt30
1   NaN             Mkt31
2   NaN             Mkt36
3   NaN             Mkt37

df_4:

    Tag_Name    parent
0   Mkt34       1fabfd31-f6aa-4061-a692-35bf6d19c9ae
1   Mkt29       edfff48b-a9d6-4c56-84c3-0fb8dab9b88f
2   Mkt35       edfff48b-a9d6-4c56-84c3-0fb8dab9b88f
3   Mkt30       edfff48b-a9d6-4c56-84c3-0fb8dab9b88f
4   Mkt32       edfff48b-a9d6-4c56-84c3-0fb8dab9b88f
5   Mkt31       edfff48b-a9d6-4c56-84c3-0fb8dab9b88f
6   Mkt36       edfff48b-a9d6-4c56-84c3-0fb8dab9b88f
7   Mkt38       1fabfd31-f6aa-4061-a692-35bf6d19c9ae
8   Mkt39       1fabfd31-f6aa-4061-a692-35bf6d19c9ae
9   Mkt33       1fabfd31-f6aa-4061-a692-35bf6d19c9ae
10  Mkt40       1fabfd31-f6aa-4061-a692-35bf6d19c9ae
11  Mkt37       1fabfd31-f6aa-4061-a692-35bf6d19c9ae

Following is the code that i am using:

df_5.loc[df_5.Tag_Name.isin(df_4.Tag_Name), ['parent']] = df_4[['parent']]

This is supposed to be matching the values from Tag_Name column of df_5, finding it in df_4 and then returning its parent value in df_5 from df_4. This line of code fails when there are more number of rows as shown in the example above.

Here the current output will be as follows which is NOT correct.

df_5:

    parent                                  Tag_Name
0   1fabfd31-f6aa-4061-a692-35bf6d19c9ae                Mkt30
1   edfff48b-a9d6-4c56-84c3-0fb8dab9b88f                Mkt31
2   edfff48b-a9d6-4c56-84c3-0fb8dab9b88f                Mkt36
3   edfff48b-a9d6-4c56-84c3-0fb8dab9b88f                Mkt37

Ideal expected output should be:

df_5:

    parent                                  Tag_Name
0   edfff48b-a9d6-4c56-84c3-0fb8dab9b88f                Mkt30
1   edfff48b-a9d6-4c56-84c3-0fb8dab9b88f                Mkt31
2   edfff48b-a9d6-4c56-84c3-0fb8dab9b88f                Mkt36
3   1fabfd31-f6aa-4061-a692-35bf6d19c9ae                Mkt37

Is there any one liner code that will work in this case? I am looking to simplify the solution without using pd.merge which will give me the correct value.

RSM
  • 645
  • 10
  • 25
  • is there a reason you don't want to use merge? You can merge a specific column in with one line to give you what you need. – David Erickson Oct 20 '20 at 21:01
  • This is what you are looking for: `pd.merge(df_5[['Tag_Name']], df_4, how='left',on='Tag_Name')` – David Erickson Oct 20 '20 at 21:05
  • @DavidErickson Thanks. I was avoiding pd.merge to reduce the no. of lines in the code since this will be repeated multiple times at different levels. will go with this if there is no other option.. – RSM Oct 21 '20 at 05:25
  • you can also do multiple pd merges at one once with functools: https://stackoverflow.com/questions/34338831/pandas-merge-multiple-dataframes-and-control-column-names – David Erickson Oct 21 '20 at 05:29

0 Answers0