1

I have two different dataframes which i need to compare.

These two dataframes are having different number of rows and doesnt have a Pk its Composite primarykey of (id||ver||name||prd||loc)

df1:

id ver name   prd  loc
a  1   surya  1a   x
a  1   surya  1a   y
a  2   ram    1a   x
b  1   alex   1b   z
b  1   alex   1b   y
b  2   david  1b   z

df2:

id ver name   prd  loc
a  1   surya  1a   x
a  1   surya  1a   y
a  2   ram    1a   x
b  1   alex   1b   z

I tried the below code and this workingif there are same number of rows , but if its like the above case its not working.

df1 = pd.DataFrame(Source)
df1 = df1.astype(str) #converting all elements as objects for easy comparison

df2 = pd.DataFrame(Target)
df2 = df2.astype(str) #converting all elements as objects for easy comparison


header_list =  df1.columns.tolist() #creating a list of column names from df1 as the both df has same structure

df3 = pd.DataFrame(data=None, columns=df1.columns, index=df1.index)

    for x in range(len(header_list)) :

        df3[header_list[x]] = np.where(df1[header_list[x]] == df2[header_list[x]], 'True', 'False')

df3.to_csv('Output', index=False)

Please leet me know how to compare the datasets if there are different number od rows.

Balaji Ambresh
  • 4,977
  • 2
  • 5
  • 17
  • Since both the dataframes have `id` column in common, would it be sufficient to restrict the check to just the `id` column? Plese update your post with the output for your dfs. This [post](https://stackoverflow.com/help/how-to-ask) should help you get started. – Balaji Ambresh Aug 20 '20 at 08:32

1 Answers1

2

You can try this:

~df1.isin(df2)
# df1[~df1.isin(df2)].dropna()

Lets consider a quick example:

df1 = pd.DataFrame({
'Buyer': ['Carl', 'Carl', 'Carl'],
'Quantity': [18, 3, 5, ]})

#    Buyer  Quantity
# 0  Carl        18
# 1  Carl         3
# 2  Carl         5

df2 = pd.DataFrame({
'Buyer': ['Carl', 'Mark', 'Carl', 'Carl'],
'Quantity': [2, 1, 18, 5]})

#    Buyer  Quantity
# 0  Carl         2
# 1  Mark         1
# 2  Carl        18
# 3  Carl         5


~df2.isin(df1)

#    Buyer  Quantity
# 0  False  True
# 1  True   True
# 2  False  True
# 3  True   True


df2[~df2.isin(df1)].dropna()

#   Buyer   Quantity
# 1 Mark    1
# 3 Carl    5

Another idea can be merge on the same column names.

Sure, tweak the code to your needs. Hope this helped :)

A. Nadjar
  • 2,440
  • 2
  • 19
  • 20