0

How can I subtract two Data frames? i have two data frame, A and B. i want to subtract them (A-B). In such a way that the rows of B that are in A delete from A. for more explain: i want to delete the the rows of A Which are repeated in B. for example:

enter image description here

then i need a function to do A-B and give following result:

enter image description here

do you know this function or method in pandas?!

Hadi Taj
  • 71
  • 9
  • 1
    You can use [`isin`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.isin.html), ```A[~A['name'].isin(B['name'])]``` – sushanth Jul 12 '20 at 05:04
  • Here is a dupe, https://stackoverflow.com/a/19960116/4985099 – sushanth Jul 12 '20 at 05:06

1 Answers1

0

You can achieve this "subtraction" as follows:

dfA = dfA[~dfA['name'].isin(dfB['name'])]

If you do need the index to be in order as your expected, you can reset the index.

dfA = dfA.reset_index(drop=True)

EDIT: (OP claims all columns need to be same to be subtracted)

Firstly, find the identical rows:

df_common = dfA.merge(dfB, how='inner')

Then, use the dataframe index to perform this subtraction:

dfA[dfA.index.isin(df_common.index)]
Christopher
  • 731
  • 6
  • 24
  • tanks, but it can not do my request. by your Suggested code every rows of A data frame that have same value for "name" column in B data frame will delete . but i want delete a row from A data frame just when all of values (values for all column) in this row be same in B. means all of columns of two rows be repeatitive – Hadi Taj Jul 12 '20 at 05:36
  • @HadiTaj, you should clarify that in your question. I've updated my answer based on what you have clarified. – Christopher Jul 12 '20 at 07:04
  • tanks for your attention. but it can not possible to find identical rows In this way . this data frame is too large.let change my question. Actually i have two text data set file that are in "csv" format. the first one that is bigger, has more than 125,000 rows (instances or records) and the smaller data set that is a subset of bigger one has 25000 rows. then i want to create a new data set from bigger data set that doesn't contain the rows of smaller data set. so i need to load them into my codes, convert them to data frame and then subtract smaller one from bigger one. – Hadi Taj Jul 12 '20 at 11:37
  • @HadiTaj, sorry I couldn't help then. I don't understand why my solution doesn't work for you. There are too less information you provided, and you didn't let me know why my solution doesn't work. Also, if you need code end-to-end, you may need to go to freelancer to find a code service. StackOverflow doesn't mean to provide coding services. – Christopher Jul 12 '20 at 12:39
  • excuse me sir. perhaps i could not explain my problem correctly. anyway i found the solution. we can do it in to step at the first level we should concatenate two data frame, then drop duplicate rows. df_C= pd.concat([df_A, df_B]) df_C=df_C.drop_duplicates(keep=False) tanks a lot for your helps. – Hadi Taj Jul 12 '20 at 14:26
  • @HadiTaj, that is called concatenation, not subtract... – Christopher Jul 12 '20 at 23:27
  • it is not just concatenation. i found out i can do it by to step. the first step is concatenation and second step is drop duplicate. anyway i tanks you again for your help – Hadi Taj Jul 13 '20 at 20:57