1

I'm trying to compare 2 columns in one dataframe(df1) with 2 columns in another dataframe(df2). After comparison, I want to select the rows where the first two columns do not match. You can see my attempts below and this what the dataframes look like [1]

import pandas as pd

fd1= 'Q37.xlsx'
fd2= 'Q43.xlsx'
df1 = pd.read_excel( fd1, sheetname='prio 1') 
df2 = pd.read_excel( fd2, sheetname='prio 1')


closed_items= {} #items in fd1 but not in fd2
new_items={}  #items in fd2 but not in fd1

In order to get closed_items, I've tried the following 3 things

closed_items.where(df1[df1['Code'].values!=df2[df2['Code'].values and 
                   df1['Owner'].values != key in df1['Owner'].values)

and gotten

ValueError: Can only compare identically-labeled Series objects

I've also tried

Closed_items = df2.loc[(df2['Code'] != df1['Code']) and 
               df2.loc[(df2['Owner'] != df1['Owner'])]

And lastly I tried

for key in df1['Code'].values:
    if key in df1['Code'].values != key in df1['Code'].values or key in 
              df1['Owner'].values != key in df1['Owner'].values:

          closed_items.append()
     else:
           pass 

Which gave this syntax

 The truth value of an array with more than one element is ambiguous. 
 Use a.any() or a.all()

...

AFP= pd.ExcelWriter("AFP.xlsx", engine='xlsxwriter')

closed_items.to_excel(AFP, sheet_name='Closed', index=False)
user3471881
  • 2,614
  • 3
  • 18
  • 34
  • I would think better is to have equally shaped series and try to use something like not s1.intersection(s2) or difference https://stackoverflow.com/questions/18079563/finding-the-intersection-between-two-series-in-pandas https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Index.difference.html – Sergii Nov 27 '18 at 12:18

1 Answers1

0

The problem is that df1 and df2 are of different shapes hence the loc will not work. You first need to merge df1 and df2 like

df3 = df1.merge(df2,on='common_key',how='left',suffixes=('_df1','_df2'))
            df3['select'] = 0
df3.loc[(df3['Code_df1'] == df3['Code_df2']) & 
                           (df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1

df3.loc[df3['select']==0,:]

will return wherever they do not match

Abhishek Sharma
  • 1,909
  • 2
  • 15
  • 24
  • It returns Invalid syntax on df3.loc[(df3['Code_df1'] == df3['Code_df2']) & df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'] = 1 – gabriella Nov 29 '18 at 08:50
  • Use this I missed a closing parenthesis while copying the code. df3.loc[(df3['Code_df1'] == df3['Code_df2']) & (df3.loc[(df3['Owner_df1'] == df3['Owner_df2']),'select'])] = 1 – Abhishek Sharma Nov 30 '18 at 09:10