1

How can I remove the duplicate entries in the Pandas DataFrame given below.

a   b   c   d
11216   08-08-2018  2000    SIP
40277   28-08-2018  1000    SIP
44165   02-08-2018  8000    Lump
44165   03-08-2018  5000    Lump
45845   16-08-2018  25000   Lump
45845   18-08-2018  50000   Lump
52730   13-08-2018  10000   Lump
52730   27-08-2018  10000   Lump
53390   20-08-2018  400000  Lump
56180   02-08-2018  1000    Lump
58537   11-07-2018  5000    Lump
58537   22-08-2018  2000    SIP
912813  15-08-2018  160001  Lump
912813  15-08-2018  6000    SIP
85606   16-08-2018  3500    SIP
88327   06-08-2018  5000    SIP
90240   07-08-2018  2000    SIP

Desired result:

a   b   c   d
11216   08-08-2018  2000    SIP
40277   28-08-2018  1000    SIP
44165   02-08-2018  8000    Lump
45845   16-08-2018  25000   Lump
52730   13-08-2018  10000   Lump
53390   20-08-2018  400000  Lump
58537   11-07-2018  5000    Lump
912813  15-08-2018  160001  Lump
912813  15-08-2018  6000    SIP
85606   16-08-2018  3500    SIP
88327   06-08-2018  5000    SIP
90240   07-08-2018  2000    SIP

The condition is: remove if a2==a1 and b2<>b1.

jpp
  • 159,742
  • 34
  • 281
  • 339

2 Answers2

1

You can sort, then use duplicated with an or condition:

res = df.sort_values(['a', 'b'])\
        .loc[(~df['a'].duplicated()) | df[['a', 'b']].duplicated(keep=False)]

print(res)

         a           b       c     d
0    11216  08-08-2018    2000   SIP
1    40277  28-08-2018    1000   SIP
2    44165  02-08-2018    8000  Lump
4    45845  16-08-2018   25000  Lump
6    52730  13-08-2018   10000  Lump
8    53390  20-08-2018  400000  Lump
9    56180  02-08-2018    1000  Lump
10   58537  11-07-2018    5000  Lump
14   85606  16-08-2018    3500   SIP
15   88327  06-08-2018    5000   SIP
16   90240  07-08-2018    2000   SIP
12  912813  15-08-2018  160001  Lump
13  912813  15-08-2018    6000   SIP
jpp
  • 159,742
  • 34
  • 281
  • 339
  • above i need add using and if condition answer is greater than 5000 good and low 3000 bad – Karthikai Vasan Sep 01 '18 at 14:44
  • @KarthikaiVasan, Please don't change your question, I've rolled back. If you have a new question, you should ask separately. – jpp Sep 01 '18 at 15:24
0

First you need to add them to a list and then this code can remove the duplicated items with your conditions.

i = 0 
while i < len(a)-1 :
    if a[i] == a[i+1] and if b[i] != b[s] :
        del a[i]
        del b[i]
        del c[i]
        del d[i]
        i -= 1 
    i += 1