How to extract a new pandas DataFrame from a larger one and according to specific criteria: conditioned on some ID column retain only the max value of the column VALUES?
So far I have created a table that satisfies the above conditions:
raw_data = {'ID': ['id1', 'id1', 'id2', 'id2', 'id2', 'id3'],
'VALUES' : [4, 5, 1, 2, 3, 6]}
df = pd.DataFrame(raw_data, columns = ['ID', 'VALUES'])
df
ID VALUES
0 id1 4
1 id1 5
2 id2 1
3 id2 2
4 id2 3
5 id3 6
criterion_table = df.pivot_table(values='VALUES', columns='ID', aggfunc=np.max).T
criterion_table['ID'] = criterion_table.index
criterion_table = criterion_table d.reset_index(drop=True)
Given this table, now I would need to retain the rows in df
which satisfy/match the ID and the VALUE:
ID VALUES
0 id1 5
1 id2 3
2 id3 6
However, I got stuck? Also, I wonder whether this can be done in one go -- typical pythonesque one-liner efficiency?