0

How to extract a new pandas DataFrame from a larger one and according to specific criteria: conditioned on some ID column retain only the max value of the column VALUES?

So far I have created a table that satisfies the above conditions:

raw_data = {'ID': ['id1', 'id1', 'id2', 'id2', 'id2', 'id3'],
            'VALUES' : [4, 5, 1, 2, 3, 6]}
df = pd.DataFrame(raw_data, columns = ['ID', 'VALUES'])
df
     ID VALUES
0   id1      4
1   id1      5
2   id2      1
3   id2      2
4   id2      3
5   id3      6

criterion_table = df.pivot_table(values='VALUES', columns='ID', aggfunc=np.max).T
criterion_table['ID'] = criterion_table.index
criterion_table = criterion_table d.reset_index(drop=True)

Given this table, now I would need to retain the rows in df which satisfy/match the ID and the VALUE:

     ID VALUES
0   id1      5
1   id2      3
2   id3      6

However, I got stuck? Also, I wonder whether this can be done in one go -- typical pythonesque one-liner efficiency?

striatum
  • 1,428
  • 3
  • 14
  • 31

0 Answers0