0

I would like to join two dataframes columns by particular columns but I would like some columns to be joined together. I asked something similar here: old question

But now I would need to further split the results in the merge column:

df1 = pd.DataFrame({'List' : ['P111:P666', 'P999', 'P111;P999:P777 ','P555', 'P666:P111;P333'],
                    'Color' : ['R', 'R', 'B','Y', 'R']})

          List        Color
 0        P111:P666     R
 1             P999     R
 2  P111;P999:P777      B
 3             P555     Y
 4   P666:P111;P333     R


df2 = pd.DataFrame({'Cod' : ['P111', 'P222', 'P333', 'P444', 'P555', 'P666', 'P777'],
                    'Animal' : ['DOG,FROG', 'CAT', 'BUG','SNAKE,DOG', 'CAT,BUG', 'DOG', 'SNAKE']})
    Cod     Animal
0  P111   DOG,FROG
1  P222        CAT
2  P333        BUG
3  P444  SNAKE,DOG
4  P555    CAT,BUG
5  P666        DOG
6  P777      SNAKE

And I would like:

     List           Color     Animal             
 0  P111:P666           R     DOG,FROG:DOG      
 1  P999                R     -                  
 2  P111;P999:P777      B     DOG,FROG;-:SNAKE  
 3  P555                Y     CAT,BUG            
 4  P666:P111;P333      R     DOG:DOG,FROG;BUG   

Basically I would like to put ";" and ":" in the "Animal" match column in accordance with the signs in the initial "List" column. If there isn't math I put "-"

Mario
  • 89
  • 6

1 Answers1

0

You can split, explode, then merge and aggregate again:

out = df1.join(
 df1['List']
 .str.split(';')
 .explode().reset_index(name='Cod')
 .merge(df2, how='left')
 .fillna('-')
 .groupby('index')
 .agg({'Animal': ':'.join,
       'Letter': '|'.join})
 )

Output:

             List   Color            Animal       Letter
0            P111     red          DOG,FROG          A,F
1            P999     red                 -            -
2  P111;P999;P777    blue  DOG,FROG:-:SNAKE    A,F|-|Z,L
3            P555  yellow           CAT,BUG          C,A
4  P666;P111;P333     red  DOG:DOG,FROG:BUG  M,C|A,F|S,M

keep separators

We need a different approach, using a regex:

regex = '|'.join(df2['Cod'])+'|\w+'
mapper = df2.set_index('Cod')['Animal'].to_dict()

df1['Animal'] = df1['List'].str.replace(regex, lambda x: mapper.get(x.group(), '-'), regex=True)

Output:

             List   Color            Animal
0            P111     red          DOG,FROG
1            P999     red                 -
2  P111;P999;P777    blue  DOG,FROG;-;SNAKE
3            P555  yellow           CAT,BUG
4  P666;P111;P333     red  DOG;DOG,FROG;BUG
mozway
  • 194,879
  • 13
  • 39
  • 75
  • this does not work because for example the line "P111;P999;P777 blue DOG:-:SNAKE A,F|-|Z,L" should instead be "P111;P999:P777 blue DOG,FROG;-:SNAKE A,F|-|Z,L" according to the signs on the initial list – Mario Oct 18 '22 at 21:36
  • The problem is to make the separators of the "List" column propagate on the "Animals" column – Mario Oct 18 '22 at 21:37
  • I see, then it probably needs a list of potential separators and a loop – mozway Oct 18 '22 at 21:39
  • Do you think it is very complex? – Mario Oct 18 '22 at 21:51
  • 1
    Maybe not but I'm on my phone and the screen is a bit tiny for that, maybe tomorrow (don't hesitate to remind me) – mozway Oct 18 '22 at 21:55
  • Would you have a moment for a check now? – Mario Oct 19 '22 at 12:52
  • I can't really resolve, sorry for insistence – Mario Oct 19 '22 at 19:55
  • See update for the mapping while keeping separators. Now you can combine both approaches (with only aggregation of the letters for the first part) – mozway Oct 19 '22 at 20:16
  • First of all, thank you for your reply and your time. However I don't think I understand it, now I just get ";" as a seperator? – Mario Oct 19 '22 at 20:29
  • Well then maybe there were too many infromations and it was confusing, could I take a quick look at you now? I need that in animals the elements are reported taking into account ":" and ";" initials in list – Mario Oct 19 '22 at 20:53