Slicing from large pandas DataFrame

Asked Aug 21 '20 at 10:41

Active Aug 21 '20 at 10:55

Viewed 24 times

How to extract a new pandas DataFrame from a larger one and according to specific criteria: conditioned on some ID column retain only the max value of the column VALUES?

So far I have created a table that satisfies the above conditions:

raw_data = {'ID': ['id1', 'id1', 'id2', 'id2', 'id2', 'id3'],
            'VALUES' : [4, 5, 1, 2, 3, 6]}
df = pd.DataFrame(raw_data, columns = ['ID', 'VALUES'])
df
     ID VALUES
0   id1      4
1   id1      5
2   id2      1
3   id2      2
4   id2      3
5   id3      6

criterion_table = df.pivot_table(values='VALUES', columns='ID', aggfunc=np.max).T
criterion_table['ID'] = criterion_table.index
criterion_table = criterion_table d.reset_index(drop=True)

Given this table, now I would need to retain the rows in df which satisfy/match the ID and the VALUE:

     ID VALUES
0   id1      5
1   id2      3
2   id3      6

However, I got stuck? Also, I wonder whether this can be done in one go -- typical pythonesque one-liner efficiency?

edited Aug 21 '20 at 10:55

asked Aug 21 '20 at 10:41

striatum

1,428
3
14
31

2

Can you add some sample data and expected output? – jezrael Aug 21 '20 at 10:41
I think you need `idxmax()` with `groupby`, but maybe I am wrong, please add some data for explian if need something else – jezrael Aug 21 '20 at 10:45
I have added a toy example. Thanks! – striatum Aug 21 '20 at 10:56

Slicing from large pandas DataFrame

0 Answers0