1

I have 2 dataframes with different no. of rows and different column names. I want to compare and get the matching rows specific to that columns as output.

e.g

df1 = pd.DataFrame({'foo': [11, 22, 33], 'bar': ['aa', 'ab', 'ac'], 'foobar': [111, 222, 333]})
df2 = pd.DataFrame({'AA': [1,22], 'BB': ['see','ab'], 'CC': [123,222]})

df1:    foo bar foobar
     0  11  aa  111
     1  22  ab  222
     2  33  ac  333


df2:    AA  BB  CC
     0  1   see 123
     1  22  ab  222

df2 not necessarily has to have same no of rows and columns.

expected output: for matching rows of df2 in df1

df3:        
        foo bar foobar
    1   22  ab  222
     

I have tried using np.all, but this seems to work only if we have same no. of rows or single row in df2.

df3 = df1.loc[np.all(df1[['bar','foobar']].values == df2[['BB','CC']].values, axis=1),:]

Essentially needed, difference rows or matching rows from any of the df1 or df2.

expected output: for unmatched rows of df1 from df2

df3:        
        foo bar foobar
    0   11  aa  111
    2   33  ac  333

Imagine in this case: The order of columns are different, column mapping I will do. example: ( if columns values of a,b,c of df1 == column values of d,e,f in df2) get me the matched rows form df1 or df2.

df1 = pd.DataFrame({'foo': [11, 22, 33], 'bar': ['aa', 'ab', 'ac'], 'foobar': [111, 222, 333], 'barfoo':[2,22,34]})
df2 = pd.DataFrame({'AA': [22,33], 'CC': [222,333], 'BB': ['ab','ac']})

output : In this case I am matching on (foo:AA, bar:BB, foobar:CC)

df3: 
        foo bar foobar barfoo
    1   22  ab  222 22
    2   33  ac  333 34

Appreciate and thanks.

theMerakist
  • 239
  • 2
  • 14

2 Answers2

1

You can temporarily rename the columns of df2 and perform the inner join (a.k.a. merge) on the two dataframes. It will find all rows that are present in both dataframes:

mapper = dict(zip(df2, df1)) # Column mapper
df2.rename(columns=mapper).merge(df1)
#   foo bar  foobar
#0   22  ab     222
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • Thanks @DYZ but this helps only if we have same no. of columns in both df. I have edited my question. FYI again thanks. – theMerakist Aug 07 '20 at 06:28
0
import pandas as pd

df1 = pd.DataFrame({'foo': [11, 22, 33], 'bar': ['aa', 'ab', 'ac'], 'foobar': [111, 222, 333]})
df2 = pd.DataFrame({'AA': [1,22], 'BB': ['see','ab'], 'CC': [123,222]})
df3 = df2.rename({'AA': 'foo', 'BB': 'bar', 'CC': 'foobar'})
df3 = df1.merge(df3, how = 'inner' ,indicator=False)
print('df1\n',df1)
print('df2\n',df2)
print('df3\n',df3)

Output

df1
    foo bar  foobar
0   11  aa     111
1   22  ab     222
2   33  ac     333
df2
    AA   BB   CC
0   1  see  123
1  22   ab  222
df3
    foo bar  foobar
0   22  ab     222
Adhun Thalekkara
  • 713
  • 10
  • 23
  • Thanks @Adhun Thalekkara but, I do not want to rename the columns, and In my case the no. of columns in two df are different. same with no of rows. – theMerakist Aug 07 '20 at 05:49
  • @theMerakist actually df2 is not renamed. for better understanding updated the answer – Adhun Thalekkara Aug 07 '20 at 05:55
  • df3 = df1.loc[np.all(df1[['bar','foobar']].values == df2[['BB','CC']].values, axis=1),:] , I am looking in these lines, Independent of column names and size. But this is throwing an axis error : AxisError: axis 1 is out of bounds for array of dimension 0 – theMerakist Aug 07 '20 at 06:17