Pandas: Update the dataframe based on condition

Question

import pandas as pd

# create the DataFrame
df = pd.DataFrame({
    'index': [1, 2, 3, 4, 5, 6, 7],
    'result': [80, 85, 90, 91, 92, 93, 94],
    'holiday': ['Dam', 'Dam', 'Easter', ['Easter', 'TomTom'], ['TomTom', 'Christmas'], None, ['Birthday', 'Halloween']]
})

# split the holiday lists into separate rows
df = df.explode('holiday')

# group by holiday name and find the maximum result and count of friends
result_count = df.groupby('holiday').agg(Max=('result', 'max'), Count=('result', 'count')).reset_index()

# create the DataFrame
df_out = pd.DataFrame({
    'index': [1, 2, 3, 4, 5, 6, 7],
    'result': [80, 85, 90, 91, 92, 93, 94],
    'holiday': ['Dam', 'Dam', 'Easter', 'TomTom', 'TomTom', None, 'Birthday']
})

Above is my dataframe and expected dataframe. I wish to transform original dataframe such that the holiday in list is replaced by a single holiday with max result, if two holiday have same result than by max count, if even count is same replace by any one holiday. I am able to get the df_out by looping over each row and doing alot of if else but wish to see if a cleaner method exist?

Does this answer your question? [Update row values where certain condition is met in pandas](https://stackoverflow.com/questions/36909977/update-row-values-where-certain-condition-is-met-in-pandas) — valentinmk, May 08 '23 at 09:38
When the dataframe is huge, the approach highlighted above doesn't work well or is not ideal! — Lata, May 08 '23 at 10:06

score 0 · Answer 1 · answered May 08 '23 at 12:39

You can merge the Max and Count back to the exploded dataframe and then sort it by Max and Count, then for each index, choose the first record. Also when groupby holiday, can set dropna False to keep the None group.

df_explode = df.explode('holiday')
# keep the None group
result_count = df_explode.groupby('holiday', dropna=False).agg(Max=('result', 'max'), Count=('result', 'count')).reset_index()
df_explode = df_explode.merge(result_count, how='left')
result = df_explode.sort_values(['Max', 'Count'], ascending=False).groupby('index').first()
result = result.reset_index().drop(['Max', 'Count'], axis=1)

Pandas: Update the dataframe based on condition

1 Answers1