0

I have looked through all the internets and pandas docs for this.

I have 2 pandas DataFrames (with string values) They actually have 14 columns and 1667 rows.

I need to compare them and return the value or values that are in the longer one that are not in the shorter one

   df1
0 X
1 Y
2 Z

   df2
0 X
1 Y

Result:

   df_diff
2 Z

I tried it with different merges and joins and maybe I am just stupid, but could not make it work. This is part of a function that compares 2 dataframes and returns all the non-equal rows in new dataframe. If you have the solution to the entire function feel free to post it as answer.

Karl
  • 146
  • 1
  • 1
  • 12
  • Anti-Merge does NOT work – Karl Mar 26 '19 at 16:36
  • How are you using it? You need a simple `isin` – yatu Mar 26 '19 at 16:40
  • isin does not work simply because I have multiple columns (14) , I could probably also use unique instead of isin but again does not make sense for multiple columns (sorry was unclear by the example above) , my main issue is that I need to know what value in the 14 columns of each row is different and where it is. Isin would just tell me if value „a“ is in the DataFrame. I have tried isin as I have found „anti-merge“ and similar questions. It does not do the trick. – Karl Mar 26 '19 at 17:16
  • Maybe I am just unclear: I have 2 tables with 14 columns and 1667 rows. I need to write a script that compares them row by row to check what value is different in what row for each column. It’s a user management table that I have to match. So if one user is in df1 and in df2 BUT in another column it says enabled instead of disabled I would have to know where that is. Or if user is in d1 and in d2 but in another column „lastname“ his lastname ist spelled different from df1 to df2 I would also have to know – Karl Mar 26 '19 at 17:20
  • Another idea I have was to use iterrows() but I could‘t het it to work and the the next issue is that the tables do not have to be equal length (if one or more users are missing on one table) , so I thought i would write an if statement to check and decide which way to use based on if they are equal length ...but again couldn’t work with iterrows on 2 dataframes... @yatu – Karl Mar 26 '19 at 17:24

1 Answers1

0

Would this work?

set(df_1['columname']) - set(df_2['columnname'])

if you want differences of the complete dataset, you could do:

set(df_1.values.flatten()) - set(df_2.values.flatten())
warped
  • 8,947
  • 3
  • 22
  • 49
  • No as I would have to compare all columns (sorry was not clear because of the example) – Karl Mar 26 '19 at 16:35
  • Anti-Merge does not work... – Karl Mar 26 '19 at 16:38
  • No still does not work as I will not know where that value is in my dataset, I have 1667 rows in a table in my database , as this would tell me what value is missing, it does not tell me where and I also need to know if it is the only one on the row that is different but that was the end part of my question so if you could write it so I know the position in the DataFrames where that value is it ll be the answer , sadly sets and lists don’t keep order – Karl Mar 26 '19 at 17:11