import pandas as pd
# create the DataFrame
df = pd.DataFrame({
'index': [1, 2, 3, 4, 5, 6, 7],
'result': [80, 85, 90, 91, 92, 93, 94],
'holiday': ['Dam', 'Dam', 'Easter', ['Easter', 'TomTom'], ['TomTom', 'Christmas'], None, ['Birthday', 'Halloween']]
})
# split the holiday lists into separate rows
df = df.explode('holiday')
# group by holiday name and find the maximum result and count of friends
result_count = df.groupby('holiday').agg(Max=('result', 'max'), Count=('result', 'count')).reset_index()
# create the DataFrame
df_out = pd.DataFrame({
'index': [1, 2, 3, 4, 5, 6, 7],
'result': [80, 85, 90, 91, 92, 93, 94],
'holiday': ['Dam', 'Dam', 'Easter', 'TomTom', 'TomTom', None, 'Birthday']
})
Above is my dataframe and expected dataframe. I wish to transform original dataframe such that the holiday in list is replaced by a single holiday with max result, if two holiday have same result than by max count, if even count is same replace by any one holiday. I am able to get the df_out by looping over each row and doing alot of if else but wish to see if a cleaner method exist?