1

I am working on a dataset that I can see it has duplication but when I use df.duplicates it returns false because of the time column is unique. How can I get the duplication from A,B, C based on time difference of the duplicates? for example, if the time difference is less then 200 ms delete duplicates

sample of my data

Hajer
  • 11
  • 2
  • 3
    Welcome to StackOverflow. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Mar 22 '18 at 14:24

1 Answers1

0

IIUC, you could do something like this:

np.random.seed(123)

df = pd.DataFrame({'A':np.random.randint(1,3,48),'B':np.random.randint(11,13,48),'C':np.random.randint(101,113,48),'time':pd.date_range('2014-09-10',periods=48,freq='10T')})

df.join(df.groupby(pd.Grouper(key='time', freq='30T'), group_keys=False, as_index=False).apply(lambda x: x.duplicated(['A','B','C'], keep=False)).rename('dups'))

Output:

    A   B    C                time   dups
0   1  11  110 2014-09-10 00:00:00  False
1   2  11  103 2014-09-10 00:10:00  False
2   1  12  105 2014-09-10 00:20:00  False
3   1  12  109 2014-09-10 00:30:00  False
4   1  11  102 2014-09-10 00:40:00  False
5   1  11  103 2014-09-10 00:50:00  False
6   1  12  102 2014-09-10 01:00:00  False
7   2  11  102 2014-09-10 01:10:00  False
8   2  12  104 2014-09-10 01:20:00  False
9   1  11  106 2014-09-10 01:30:00  False
10  2  11  110 2014-09-10 01:40:00  False
11  2  12  101 2014-09-10 01:50:00  False
12  1  11  109 2014-09-10 02:00:00  False
13  2  12  112 2014-09-10 02:10:00  False
14  1  11  102 2014-09-10 02:20:00  False
15  2  12  107 2014-09-10 02:30:00  False
16  1  11  104 2014-09-10 02:40:00  False
17  2  11  104 2014-09-10 02:50:00  False
18  2  11  112 2014-09-10 03:00:00  False
19  1  11  106 2014-09-10 03:10:00  False
20  1  12  110 2014-09-10 03:20:00  False
21  1  11  108 2014-09-10 03:30:00  False
22  2  11  110 2014-09-10 03:40:00  False
23  2  12  103 2014-09-10 03:50:00  False
24  2  12  104 2014-09-10 04:00:00   True
25  1  12  112 2014-09-10 04:10:00  False
26  2  12  104 2014-09-10 04:20:00   True
27  1  11  104 2014-09-10 04:30:00  False
28  1  11  109 2014-09-10 04:40:00  False
29  1  11  107 2014-09-10 04:50:00  False
30  1  11  110 2014-09-10 05:00:00  False
31  2  12  108 2014-09-10 05:10:00  False
32  2  12  107 2014-09-10 05:20:00  False
33  2  11  104 2014-09-10 05:30:00  False
34  1  11  110 2014-09-10 05:40:00  False
35  1  11  107 2014-09-10 05:50:00  False
36  2  11  107 2014-09-10 06:00:00  False
37  1  12  112 2014-09-10 06:10:00  False
38  1  11  107 2014-09-10 06:20:00  False
39  2  12  102 2014-09-10 06:30:00  False
40  1  12  111 2014-09-10 06:40:00  False
41  2  11  104 2014-09-10 06:50:00  False
42  1  12  105 2014-09-10 07:00:00  False
43  2  12  104 2014-09-10 07:10:00  False
44  2  12  102 2014-09-10 07:20:00  False
45  2  11  101 2014-09-10 07:30:00  False
46  1  12  106 2014-09-10 07:40:00  False
47  1  12  109 2014-09-10 07:50:00  False
Scott Boston
  • 147,308
  • 15
  • 139
  • 187