0

I am trying to compare two excel file that are different size, one has 5701 row and the other has 5904 row. The columns are Price and Project Description. I am trying to compare by text to see what the project difference are.


import pandas as pd
import numpy as np

df = pd.read_csv('C:/Users/Text/Downloads/D1.csv')
df2 = pd.read_csv('C:/Users/Text/Downloads/D2.csv')


df['text_match'] = np.where(df['Project ID'] == df2['Project ID'], 'True', 'False')
print(df.loc[(df['text_match'] == 'False')])

I get the following error when I try to run the code:

raise ValueError("Can only compare identically-labeled Series objects")
ValueError: Can only compare identically-labeled Series objects
MDL7833
  • 1
  • 3

2 Answers2

0

To compare 2 dfs - the numbers of records must be the same:

You can use:

df1.equals(df2)

then get - false, there isn't option to compare.

There is option to use:

df1.reset_index(drop=True) == df2.reset_index(drop=True)

There is another option - cut the second df to (5701 rows)

# Number of rows to drop
n = 203
 
# Removing last n rows
df2 = df2.iloc[:-n]
Piotr Żak
  • 2,046
  • 5
  • 18
  • 30
0

You can use the Pandas .compare() function, as follows:

df.compare(df2)

Sample output is as follows:

  col1       col3
  self other self other
0    a     c  NaN   NaN
2  NaN   NaN  3.0   4.0

It highlight the differences with self for df and other for df2

You can just compare part of the columns by e.g.

df[['Project ID', 'Price']].compare(df2[['Project ID', 'Price']])

Or, if you just want to compare only the Project ID column:

df['Project ID'].compare(df2['Project ID'])

Another method is to try filtering for unmatched Project ID, using .isin(), as follows:

df.loc[~df['Project ID'].isin(df2['Project ID'])]

and:

df2.loc[~df2['Project ID'].isin(df['Project ID'])]
SeaBean
  • 22,547
  • 3
  • 13
  • 25
  • I tried the df1.compare(df2), it receive the following error: ValueError: Can only compare identically-labeled DataFrame objects – MDL7833 Oct 05 '21 at 15:33
  • @MDL7833 The 2 dataframes are of different column labels ? You can rename one of them first – SeaBean Oct 05 '21 at 15:34
  • @MDL7833 You can also limit the number of columns to compare. See my edit above. – SeaBean Oct 05 '21 at 15:38
  • @MDL7833 See also alternative approach to filter unmatched `Project ID` by `.isin()`. – SeaBean Oct 05 '21 at 15:48