1

I would like to join two files by key but I would like some columns to be joined together for example:

File1:

    df1 = pd.DataFrame({'List' : ['P111', 'P999', 'P111;P999;P777', 'P555', 'P666;P111;P333'],
               'Color' : ['red', 'red', 'blue','yellow', 'red']})

        List        Color
0  P111              red
1  P999              red
2  P111;P999;P777    blue
3  P555              yellow
4  P666;P111;P333    red

File2:

    df2 = pd.DataFrame({'Cod' : ['P111', 'P222', 'P333', 'P444', 'P555', 'P666', 'P777'],
               'Animal' : ['DOG', 'CAT', 'BUG','SNAKE,DOG', 'CAT,BUG', 'DOG', 'SNAKE'],
               'Letter' : ['A,F', 'C', 'S,M', 'F,L', 'C,A','M,C', 'Z,L']})

    Cod     Animal     Letter
 0  P111    DOG         A,F
 1  P222    CAT         C
 2  P333    BUG         S,M
 3  P444    SNAKE,DOG   F,L
 4  P555    CAT,BUG     C,A
 5  P666    DOG         M,C
 6  P777    SNAKE       Z,L

I would like to merge with primary key List (from file1) and Cod (from file2), to get:

     List           Color     Animal             Letter
0  P111              red      DOG,FROG           A,F
1  P999              red      -                  -
2  P111;P999;P777    blue     DOG,FROG|-|SNAKE   A,F|-|Z,L
3  P555              yellow   CAT,BUG            C,A 
4  P666;P111;P333    red      DOG|DOG,FROG|BUG   M,C|A,F|S,M

I think we need something like a left join and an agragation but I don't know how. In the final table I would like that in the values ​​not found there was a - . While I would like a | to separate values ​​in "aggregated" columns

Which is the best way?

Mario
  • 89
  • 6

2 Answers2

3

Idea is use DataFrame.explode by splitted values of List, then use left join and aggregate first values with join for expected ouput:

df = (df1.assign(Cod = df1['List'].str.split(';'))
         .explode('Cod')
         .reset_index()
         .merge(df2, how='left', on='Cod')
         .fillna('-')
         .groupby('index')
         .agg(List=('List','first'),
              Color=('Color','first'),
              Animal=('Animal','|'.join),
              Letter=('Letter','|'.join))
         .rename_axis(None))
print (df)
             List   Color       Animal       Letter
0            P111     red          DOG          A,F
1            P999     red            -            -
2  P111;P999;P777    blue  DOG|-|SNAKE    A,F|-|Z,L
3            P555  yellow      CAT,BUG          C,A
4  P666;P111;P333     red  DOG|DOG|BUG  M,C|A,F|S,M
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Hi sorry if I'm bothering you, is there a way to make all columns in df1 be 'first' and all columns in df2 join without specifying them one by one? – Mario Sep 03 '22 at 13:53
  • I have also open another question for this problem: https://stackoverflow.com/questions/73593299/groupby-and-agg-with-multiple-columns-pandas – Mario Sep 03 '22 at 15:13
  • I have a (I hope small question): if in df1 i have a row like: **P111,P123;P555,P777 red** I can split like: **.str.split('[,;]'))** but but as imposed "," instead of ";" in the join where needed? – Mario Oct 14 '22 at 15:26
  • I've created a dedicated question here if you would have a moment to take a look at it: https://stackoverflow.com/questions/74117549/merge-two-columns-from-different-df-and-put-the-results-in-the-same-columns-pand – Mario Oct 20 '22 at 07:00
0

Please specify if you want to perform join or not. Since you didn't mention any primary key. I am assuming you want to concatenate.

import pandas as pd
df1 = pd.read_excel(pwd + '/Book1.xlsx')
df2 = pd.read_excel(pwd + '/Book2.xlsx')


df_combined = pd.concat([df1,df2], ignore_index=True)


print(df_combined)

   list   color   cod animal
0  p111     red   NaN    NaN
1  p222    blue   NaN    NaN
2  p333  green    NaN    NaN
3  p444  yellow   NaN    NaN
4  p555     NaN   NaN    NaN
5   NaN     NaN  p111    rat
6   NaN     NaN  p222    cat
7   NaN     NaN  p333    bat
8   NaN     NaN  p444    hat
9   NaN     NaN  p555   cool

Result will be like above-

Also please provide sample data so that it can be in copy-pasted.

GIRIXH
  • 53
  • 8