I have a challenges DataFrame from the Great British Baking Show. Feel free to download the dataset:
pd.read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-10-25/challenges.csv")
I've cleaned up the table and now have columns of series (1 through 10), episode (6 through 10), baker (names of each baker), and result (what happened to the baker each week (eliminated vs still on the show)). I am looking for a solution that allows me to add a new column called final_score
that will list the final placement of each baker for each series.
In english what I am trying to do is:
- Count the unique number of bakers per a series.
- For each series,
for each episode,
if
result == 'OUT'
, add a column to the DF that records the baker's final score. The first score from each season will be equal to the count of bakers from step 1. I will then subtract the total baker count by 1.
As am example, the number of bakers from season 1 is 10. In episode 1, both Lea and Mark were eliminated so I want 'final_score' to read 10 for both of them. In episode 2, both Annetha and Louise were eliminated so I want their score to read 8.
I've tried window functions, apply functions, list comprehension but the closest I've gotten is pasted below. With attempt 1, I know the problem is at: if df.result =='OUT':
. I understand that this is a series but I've tried .result.items()
, result.all()
, result.any()
, if df.loc[df.result] == 'OUT':
but nothing seems to work.
Attempt 1
def final_score(df):
#count the number of bakers per season
baker_count = df.groupby('series')['baker'].nunique()
#for each season
for s in df.series:
#create a interable that counts the number of bakers that have been eliminated. Start at 0
bakers_out = 0
bakers_remaining = baker_count[int(s)]
#for each season
for e in df.episode:
#does result say OUT for each contestant?
if df.result =='OUT':
df['final_score'] = bakers_remaining
#if so, then we'll add +1 to our bakers_out iterator.
bakers_out +=1
#set the final score category to our baker_count iterator
df['final_score'] = bakers_remaining
#subtract the number of bakers left by the amount we just lost
bakers_remaining -= bakers_out
else:
next
return df
Attempt 2 wasn't about me creating a new dataframe but rather trying to trouble shoot this problem and print out my desired output to the console. This is pretty close but I want the final result to be a dense scoring so the two bakers that got out in series 1, episode 1 should both end up in 10th place, and the two bakers that got out the following week should both show 8th place.
baker_count = df.groupby('series')['baker'].nunique()
#for each series
for s in df.series.unique():
bakers_out = 0
bakers_remaining = baker_count[int(s)]
#for each episode
for e in df.episode.unique():
#create a list of results
data_results = list(df[(df.series==s) & (df.episode==e)].result)
for dr in data_results:
if dr =='OUT':
bakers_out += 1
print (s,e,dr,';final place:',bakers_remaining,';bakers out:',bakers_out)
else:
print (s,e,dr,'--')
bakers_remaining -= 1
Snippet of the result
1.0 1.0 IN --
1.0 1.0 IN --
1.0 1.0 IN --
1.0 1.0 IN --
1.0 1.0 IN --
1.0 1.0 OUT ;final place: 10 ;bakers out: 1
1.0 1.0 OUT ;final place: 10 ;bakers out: 2
1.0 2.0 IN --
1.0 2.0 IN --
1.0 2.0 IN --
1.0 2.0 IN --
1.0 2.0 IN --
1.0 2.0 IN --
1.0 2.0 OUT ;final place: 9 ;bakers out: 3
1.0 2.0 OUT ;final place: 9 ;bakers out: 4