0

I have df looks something like this:

index db id age score
1     1  1  1   2
2     1  1  2   1.5
3     1  2  2   3
4     1  2  3   4
5     2  1  2   3
6     2  1  1   1
7     2  2  3   2
8     2  2  5   3.5
9     3  1  4   4
...

I want to get every row that has the maximum age for each unique (db, id) pair. Result:

index db id age score
2     1  1  2   1.5
4     1  2  3   4
5     2  1  2   3
8     2  2  5   3.5
9     3  1  4   4

I used this functions but it is very time consuming:

def get_age_rel(main_df, age):
    data = []
    age_rel_df = main_df[main_df['age'] <= age]
    for db_index in np.unique(age_rel_df['db']):
        db_rel_df = age_rel_df[age_rel_df['db'] == db_index]
        for some_id in np.unique(db_rel_df['id']):
            data.append(max_rows(db_rel_df [db_rel_df ['id'] == some_id], 'age', 1))
    return pd.concat(data,axis=1)

def max_rows(df, col, n):

    max_indexes = df[col].nlargest(n)
    max_indexes = list(max_indexes.index)

    return df.loc[max_indexes]

0 Answers0