I have df looks something like this:
index db id age score
1 1 1 1 2
2 1 1 2 1.5
3 1 2 2 3
4 1 2 3 4
5 2 1 2 3
6 2 1 1 1
7 2 2 3 2
8 2 2 5 3.5
9 3 1 4 4
...
I want to get every row that has the maximum age for each unique (db, id) pair. Result:
index db id age score
2 1 1 2 1.5
4 1 2 3 4
5 2 1 2 3
8 2 2 5 3.5
9 3 1 4 4
I used this functions but it is very time consuming:
def get_age_rel(main_df, age):
data = []
age_rel_df = main_df[main_df['age'] <= age]
for db_index in np.unique(age_rel_df['db']):
db_rel_df = age_rel_df[age_rel_df['db'] == db_index]
for some_id in np.unique(db_rel_df['id']):
data.append(max_rows(db_rel_df [db_rel_df ['id'] == some_id], 'age', 1))
return pd.concat(data,axis=1)
def max_rows(df, col, n):
max_indexes = df[col].nlargest(n)
max_indexes = list(max_indexes.index)
return df.loc[max_indexes]