2

I want to merge two DataFrames on an exact match on the column Name and a partial match between ID and String. Here is an example:

The first DataFrame (df1) looks like this:

Name ID    Tag
AAA  1111  XYZ
AAA  1122  DEF
BBB  2222  PQR
BBB  2211  MNL
CCC  3333  STU
CCC  3311  JKL
DDD  4444  ABC

The second DataFrame (df2) looks like this;

String        Name
aasd1111dasd  AAA
sdf1122sdf    AAA 
sdffgs2222f   BBB
fsfg3333sdf   CCC
sefg4444sdf   DDD

What I would like to get to is a join to give the following:

String        Name  Tag
aasd1111dasd  AAA   XYZ
sdf1122sdf    AAA   DEF
sdffgs2222f   BBB   PQR
fsfg3333sdf   CCC   STU 
sefg4444sdf   DDD   ABC
Alex
  • 6,610
  • 3
  • 20
  • 38
Taylrl
  • 3,601
  • 6
  • 33
  • 44

1 Answers1

1

I think what you are trying to achieve is a merge on Name and then only keeping rows where ID is in the field String.

Using:

# df
    Name   ID  Tag
0   AAA  1111  XYZ
1   AAA  1122  DEF
2   BBB  2222  PQR
3   BBB  2211  MNL
4   CCC  3333  STU
5   CCC  3311  JKL
6   DDD  4444  ABC

# df2
          String  Name
0   aasd1111dasd   AAA
1     sdf1122sdf   AAA  # This line needed to be added to df2
2    sdffgs2222f   BBB
3    fsfg3333sdf   CCC
4    sefg4444sdf   DDD

Merge:

m = df.merge(df2)

Create a mask:

mask = m.apply(lambda x: str(x["ID"]) in x["String"], axis=1)

Filter by the mask:

m = m[mask][["String", "Name", "Tag"]]

Outputs:

         String Name  Tag
0  aasd1111dasd  AAA  XYZ
3    sdf1122sdf  AAA  DEF
4   sdffgs2222f  BBB  PQR
6   fsfg3333sdf  CCC  STU
8   sefg4444sdf  DDD  ABC
Alex
  • 6,610
  • 3
  • 20
  • 38