I have a dataframe storing levels per quarter, df1
:
| id | year | quarter | level |
|-----|------|---------|--------|
| 111 | 2021 | 1 | Silver |
| 111 | 2021 | 2 | Gold |
| 222 | 2021 | 1 | Bronze |
| 222 | 2021 | 2 | Silver |
I also have another dataframe, storing the same data but not grouped by quarter, df2
:
| id | level |
|-----|--------|
| 111 | Bronze |
| 222 | Gold |
I want to calculate the max level across both dataframes but cannot use Max due to (g)old < (s)ilver. Is there a way to do a custom max which captures the rule of gold > silver > bronze?
My expected output would look like this.
| id | year | quarter | level |
|-----|------|---------|--------|
| 111 | 2021 | 1 | Silver |
| 111 | 2021 | 2 | Gold |
| 222 | 2021 | 1 | Gold |
| 222 | 2021 | 2 | Gold |
I tried this before running into the issue:
output = (
df1.join(df2, on = ['id'])
.groupby('id', 'year', 'quarter')
.agg(
F.max(F.col('level')).alias('level') #would rank Silver greater than Gold
)
)