How can I use groupby()
to arrive at a count of employee types on a given day, and feed the results back into the original dataframe?
Here's the data:
shifts = [("Cashier", "Thursday"), ("Cashier", "Thursday"),
("Cashier", "Thursday"), ("Cook", "Thursday"),
("Cashier", "Friday"), ("Cashier", "Friday"),
("Cook", "Friday"), ("Cook", "Friday"),
("Cashier", "Saturday"), ("Cook", "Saturday"),
("Cook", "Saturday")]
labels = ["JOB_TITLE", "DAY"]
df = pd.DataFrame.from_records(shifts, columns=labels)
This use of value_counts()
produces the correct results:
shifts_series = df.groupby('DAY')['JOB_TITLE'].value_counts()
How to feed the values back into the original DF?
Desired results
JOB_TITLE DAY TYPE
0 Cashier Thursday 3
1 Cashier Thursday 3
2 Cashier Thursday 3
3 Cook Thursday 1
4 Cashier Friday 2
5 Cashier Friday 2
6 Cook Friday 2
7 Cook Friday 2
8 Cashier Saturday 1
9 Cook Saturday 2
10 Cook Saturday 2
transform()
?
I found some answers suggesting to use transform()
, but the results count the number of instances for 'DAY':
df['TYPE'] = df.groupby('DAY')['JOB_TITLE'].transform('count')
Nasty anti-pattern
I managed to make a nasty little Pandas anti-pattern using the answer to a different question. I tried to loop over the results and label [('Saturday', 'Cashier'), ('Thursday', 'Cook')]
:
shift_filter1 = shifts_series[shifts_series == 1].index.tolist()
df['WORKED_SOLO'] = np.nan
for workday, title in shift_filter1:
df['WORKED_SOLO'] = (np.where(((df['WORKED_SOLO'].isna()) & (df['DAY'] == workday) & (df['JOB_TITLE'] == title)), True, np.nan))
But the resulting DF replaces the previous loop's result--despite the isna()
test. Obviously not the Pandas way.