How to get unique rows from dataframe, while keeping the one with max value in some column?

Question

I have a dataframe (from the following csv):

load,timestamp,timestr
0,1576147339.49,124219
0,1576147339.502,124219
2,1576147339.637,124219
1,1576147339.641,124219
9,1576147339.662,124219
8,1576147339.663,124219
7,1576147339.663,124219
6,1576147339.663,124219
5,1576147339.663,124219
4,1576147339.663,124219
3,1576147339.663,124219
2,1576147339.663,124219
1,1576147339.663,124219
0,1576147339.663,124219
0,1576147339.673,124219
3,1576147341.567,124221
2,1576147341.568,124221
1,1576147341.569,124221
0,1576147341.57,124221
4,1576147341.581,124221
3,1576147341.581,124221

I would like to drop duplicates on the timestamp column, while remaining with the row whose 'load' value is largest.

In this case:

load,timestamp,timestr
0,1576147339.49,124219
0,1576147339.502,124219
2,1576147339.637,124219
1,1576147339.641,124219
9,1576147339.662,124219
8,1576147339.663,124219
0,1576147339.673,124219
3,1576147341.567,124221
2,1576147341.568,124221
1,1576147341.569,124221
0,1576147341.57,124221
4,1576147341.581,124221

The largest value for 'load' doesn't have to appear first!

What's the best way to do this?

I'm new to pandas, I guess that will do, but how? – Gulzar Dec 15 '19 at 07:41 — Gulzar, Dec 15 '19 at 07:41

U13-Forward · Accepted Answer · 2019-12-15T23:49:51.510

0

Try using groupby:

print(df.groupby('timestamp', as_index=False)['load'].max().join(df['timestr']))

Output:

       timestamp  load  timestr
0   1.576147e+09     0   124219
1   1.576147e+09     0   124219
2   1.576147e+09     2   124219
3   1.576147e+09     1   124219
4   1.576147e+09     9   124219
5   1.576147e+09     8   124219
6   1.576147e+09     0   124219
7   1.576147e+09     3   124221
8   1.576147e+09     2   124221
9   1.576147e+09     1   124221
10  1.576147e+09     0   124221
11  1.576147e+09     4   124221

edited Dec 15 '19 at 23:49

answered Dec 15 '19 at 07:47

U13-Forward

69,221
14
89
114

I accepted too soon... I wanted the max ON THE LOAD COLUMN. how do I write that? – Gulzar Dec 15 '19 at 11:31

score 0 · Answer 2 · answered Dec 15 '19 at 08:07

reset the precision and display the max with groupby:

pd.options.display.float_format = '{:.3f}'.format 
df.groupby('timestamp').max()

output:

                load  timestr
timestamp                    
1576147339.490     0   124219
1576147339.502     0   124219
1576147339.637     2   124219
1576147339.641     1   124219
1576147339.662     9   124219
1576147339.663     8   124219
1576147339.673     0   124219
1576147341.567     3   124221
1576147341.568     2   124221
1576147341.569     1   124221
1576147341.570     0   124221
1576147341.581     4   124221

How to get unique rows from dataframe, while keeping the one with max value in some column?

2 Answers2

Linked