1

How can I use groupby() to arrive at a count of employee types on a given day, and feed the results back into the original dataframe?

Here's the data:

shifts = [("Cashier", "Thursday"), ("Cashier", "Thursday"),
        ("Cashier", "Thursday"), ("Cook", "Thursday"),
        ("Cashier", "Friday"), ("Cashier", "Friday"),
        ("Cook", "Friday"), ("Cook", "Friday"),
        ("Cashier", "Saturday"), ("Cook", "Saturday"),
        ("Cook", "Saturday")]
labels = ["JOB_TITLE", "DAY"]
df = pd.DataFrame.from_records(shifts, columns=labels)

This use of value_counts() produces the correct results:

shifts_series = df.groupby('DAY')['JOB_TITLE'].value_counts()

How to feed the values back into the original DF?

Desired results

    JOB_TITLE   DAY TYPE
0   Cashier Thursday    3
1   Cashier Thursday    3
2   Cashier Thursday    3
3   Cook    Thursday    1
4   Cashier Friday      2
5   Cashier Friday      2
6   Cook    Friday      2
7   Cook    Friday      2
8   Cashier Saturday    1
9   Cook    Saturday    2
10  Cook    Saturday    2

transform()?

I found some answers suggesting to use transform(), but the results count the number of instances for 'DAY':

df['TYPE'] = df.groupby('DAY')['JOB_TITLE'].transform('count')

Nasty anti-pattern

I managed to make a nasty little Pandas anti-pattern using the answer to a different question. I tried to loop over the results and label [('Saturday', 'Cashier'), ('Thursday', 'Cook')]:

shift_filter1 = shifts_series[shifts_series == 1].index.tolist()
df['WORKED_SOLO'] = np.nan
for workday, title in shift_filter1:
    df['WORKED_SOLO'] = (np.where(((df['WORKED_SOLO'].isna()) & (df['DAY'] == workday) & (df['JOB_TITLE'] == title)), True, np.nan))

But the resulting DF replaces the previous loop's result--despite the isna() test. Obviously not the Pandas way.

xtian
  • 2,765
  • 7
  • 38
  • 65

1 Answers1

1

You can do the following:

import pandas as pd

shifts = [("Cashier", "Thursday"), ("Cashier", "Thursday"),
        ("Cashier", "Thursday"), ("Cook", "Thursday"),
        ("Cashier", "Friday"), ("Cashier", "Friday"),
        ("Cook", "Friday"), ("Cook", "Friday"),
        ("Cashier", "Saturday"), ("Cook", "Saturday"),
        ("Cook", "Saturday")]
labels = ["JOB_TITLE", "DAY"]
df = pd.DataFrame.from_records(shifts, columns=labels)


shifts_series = df.groupby('DAY')['JOB_TITLE'].value_counts()
shifts_series = shifts_series.reset_index(name='TYPE')

df = pd.merge(df, shifts_series, on=['JOB_TITLE', 'DAY'])
print(df)

which gives:

   JOB_TITLE       DAY  TYPE
0    Cashier  Thursday     3
1    Cashier  Thursday     3
2    Cashier  Thursday     3
3       Cook  Thursday     1
4    Cashier    Friday     2
5    Cashier    Friday     2
6       Cook    Friday     2
7       Cook    Friday     2
8    Cashier  Saturday     1
9       Cook  Saturday     2
10      Cook  Saturday     2