0

(It is not as easy as just removing the duplicates with drop_duplicates. This question is not duplicated)

I have following dataframe

   number       doc    type    sum
0      15  document   credit   10000
1      15  doc        credit   9999
2      16  passport   debit    20000
3      16  pas        debit    19999
4      16  passport   debit    25000

Row is considered duplicated if number and type are equal, doc and sum are not. So I need to remove them somehow specifying this condition

The result of removing duplicates should be

   number       doc    type    sum
0      15  document  credit  10000
2      16  passport   debit  20000
4      16  passport   debit  25000

How can I achieve that?

Flakee
  • 31
  • 4
  • @DYZ I think is not as simple as just removing duplicates, the other columns have to be different – Dani Mesejo Jul 29 '22 at 07:12
  • Why the 4 row is in the output, if has number and type equal to first one and doc and sum different – Dani Mesejo Jul 29 '22 at 07:23
  • @DaniMesejo because doc column is the same, but HAVE to be different – Flakee Jul 29 '22 at 07:25
  • @DaniMesejo it can sound stupid but same passport in doc is not a duplicate in this case – Flakee Jul 29 '22 at 07:27
  • Why if you have multiple type of docs for same number and type, for example the last row is doc instead of passport (row 4). What would be the output? – Dani Mesejo Jul 29 '22 at 08:02
  • in this case output will be row 0 and 2. 4 will be considered as duplicate. if number ant type are equal I dont care if there is an existing doc or existing sum. I am to consider it as dublicate and save first hit – Flakee Jul 29 '22 at 08:36

0 Answers0