compare two data frames and get only non matching values with index and column names pandas dataframe python

Question

df1-

ID Name  Number
0  AAA    123
1  BBB    456
2  CCC    789

df2-

ID Name  Number
0  AAA    123
1  BBB    456
2  CCC    **963**    <----- Non Matching value

want to compare above two data frames df1 and df2 want the result in below format: need only non matching value with column name.

expected output:
ID Number                        
2  963

can anyone help me with the code i am new to pandas, please help me out Thanks you soo much...

So you don't have multiple column and can accept quoting `[['ID', 'Number']]` now ? — SeaBean, May 25 '21 at 07:24
@seabean thanks here i mentioned dummy data i am having multiple columns and rows i want only non matched specific value based on index and column name — Amaresh puri, May 25 '21 at 07:35
@seabean Thanks {its converting the numbers to float type in result, can we fix this also} Thanks — Amaresh puri, May 25 '21 at 08:18
It's the default behevior of Pandas to consider columns with NaN values as float type. Hence, the display with float format for float type. See my 2nd edit below to trim all decimal points. Anyway, keep in mind that the resulting numbers are strings. You can't further match these string numbers with integer numbers. Just for cosmetic display purpose only — SeaBean, May 25 '21 at 08:46
If we try to convert the result to integer type, those `NaN` or blank entries will have to change to `0`. This would be not that easy to identify the differences. — SeaBean, May 25 '21 at 08:48
I have added another version of codes where you can keep the numbers as integers together with N/A values. At the same time, this version is even more concise. — SeaBean, May 25 '21 at 12:46

SeaBean · Accepted Answer · 2021-05-25T12:05:40.077

You can use .merge() with indicator turned on and filter the result by the indicator, as follows:

df3 = df2.merge(df1, how='left', indicator=True)
df3[df3['_merge'] == 'left_only'][['ID', 'Number']]

Result:

   ID  Number
2   2     963

Edit

If you have multiple columns and would not like to specify the column names to highlight the differences, you can try:

df2[(df1 != df2)].dropna(how='all', axis=1).dropna(how='all', axis=0)

Demo

df1

   ID Name  Number1  Number2  Number3
0   0  AAA      123       12     1111
1   1  BBB      456       22     2222
2   2  CCC      789       32     3333


df2

   ID Name  Number1  Number2  Number3
0   0  AAA      123       12     1111
1   1  BBB      456       22     2255
2   2  CCC      963       32     3333


df2[df1 != df2].dropna(how='all', axis=1).dropna(how='all', axis=0)


   Number1  Number3
1      NaN   2255.0
2    963.0      NaN

You can see from the non_NaN values the differences. The ID is the index on the left.

Edit 2

If your numbers are all integers and you don't want Pandas to show the integers as float type together with NaN values, you can use:

df2[df1 != df2].dropna(how='all', axis=1).dropna(how='all', axis=0).fillna('').astype(str).replace(r'\.0', '', regex=True)


  Number1 Number3
1            2255
2     963

Or, simply use:

df2[df1 != df2].dropna(how='all', axis=1).dropna(how='all', axis=0).astype('Int64')


   Number1  Number3
1     <NA>     2255
2      963     <NA>

Hi Sebean, Thank u soo much lets say if we are having more than 100 column names which is not possible to specify here like ['ID', 'Number']] can we get the same answer without specifying header names ['ID', 'Number']] Thanks... — Amaresh puri, May 25 '21 at 06:39
@Amareshpuri If you don't mind seeing other columns, you can just use `df3[df3['_merge'] == 'left_only']` — SeaBean, May 25 '21 at 06:41
@Amareshpuri So, you could have multiple columns and you want to display only non-matching columns ? This is a totally different question then. Please update your question with sample data of multiple columns then. Thanks! I will try to work on that. — SeaBean, May 25 '21 at 06:45
wonderful thanks yeah i will update question. {its converting the numbers to float type in result, can we fix this also} Thanks — Amaresh puri, May 25 '21 at 07:44

score 0 · Answer 2 · answered May 25 '21 at 07:16

0

You can use the following

df2[df1.Number != df2.Number][['ID', 'Number']]

answered May 25 '21 at 07:16

valentin

570
7
12

score 0 · Answer 3 · answered Aug 05 '22 at 21:39

0

You can Extract the data whatever you want from the output, which has the details of all mismatches

answered Aug 05 '22 at 21:39

Arpan Saini

4,623
1
42
50

compare two data frames and get only non matching values with index and column names pandas dataframe python

3 Answers3

Edit

Demo

Edit 2

Linked