1
import pandas as pd

# create the DataFrame
df = pd.DataFrame({
    'index': [1, 2, 3, 4, 5, 6, 7],
    'result': [80, 85, 90, 91, 92, 93, 94],
    'holiday': ['Dam', 'Dam', 'Easter', ['Easter', 'TomTom'], ['TomTom', 'Christmas'], None, ['Birthday', 'Halloween']]
})

# split the holiday lists into separate rows
df = df.explode('holiday')

# group by holiday name and find the maximum result and count of friends
result_count = df.groupby('holiday').agg(Max=('result', 'max'), Count=('result', 'count')).reset_index()

# create the DataFrame
df_out = pd.DataFrame({
    'index': [1, 2, 3, 4, 5, 6, 7],
    'result': [80, 85, 90, 91, 92, 93, 94],
    'holiday': ['Dam', 'Dam', 'Easter', 'TomTom', 'TomTom', None, 'Birthday']
})

Above is my dataframe and expected dataframe. I wish to transform original dataframe such that the holiday in list is replaced by a single holiday with max result, if two holiday have same result than by max count, if even count is same replace by any one holiday. I am able to get the df_out by looping over each row and doing alot of if else but wish to see if a cleaner method exist?

Lata
  • 107
  • 5
  • 1
    @gsamaras yeah sorry my bad.. noticed it later :D thnx – Kulasangar May 08 '23 at 09:31
  • 1
    Does this answer your question? [Update row values where certain condition is met in pandas](https://stackoverflow.com/questions/36909977/update-row-values-where-certain-condition-is-met-in-pandas) – valentinmk May 08 '23 at 09:38
  • When the dataframe is huge, the approach highlighted above doesn't work well or is not ideal! – Lata May 08 '23 at 10:06

1 Answers1

0

You can merge the Max and Count back to the exploded dataframe and then sort it by Max and Count, then for each index, choose the first record. Also when groupby holiday, can set dropna False to keep the None group.

df_explode = df.explode('holiday')
# keep the None group
result_count = df_explode.groupby('holiday', dropna=False).agg(Max=('result', 'max'), Count=('result', 'count')).reset_index()
df_explode = df_explode.merge(result_count, how='left')
result = df_explode.sort_values(['Max', 'Count'], ascending=False).groupby('index').first()
result = result.reset_index().drop(['Max', 'Count'], axis=1)
viminal
  • 76
  • 1
  • 5