Python/Pandas, count a specific values based on multiple values

Question

Using Python/Pandas and given a name-value (Bob), how do I value_count by job and state, and list by month. Initial df:

The output for Bob should be something like this::

I am working with many thousands of records and I need some way to retrieve the number of Jobs codes and states for a specific name for each month.

I can get this information individually using .loc and value_counts(), but not all at once. Grouby stats is not specific. Any ideas???

Possible duplicate of [Get statistics for each group (such as count, mean, etc) using pandas GroupBy?](https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby) — ndclt, Sep 28 '19 at 20:01
Did not know about images. I won't do it again. As far as getting statistics for each group, the link did not discuss getting statistics for a specific value (name). If I do group statistics and groupby I get results for 400+ names. I am only interested in one name. — Marty, Sep 28 '19 at 22:50
I added my eventual answer below. Thanks for pointing me in the right direction. — Marty, Sep 29 '19 at 17:13

Valdi_Bo · Answer 1 · 2019-09-28T21:28:45.957

I added one row (for June) to your test DataFrame:

   Name State Job Code Month
0   Bob    CA        A   APR
1   Joe    CA        B   APR
2  Mary    AZ        C   MAY
3   Bob    CA        D   MAY
4   Bob    NV        C   MAY
5   Bob    CA        D   MAY
6   Bob    CA        D   JUN

The reason is to show that the way I provided "calendar" sort on Month (instead of alphabetical) works OK.

To provide the mentioned "calendar" sort, start with:

df.Month = pd.Categorical(df.Month, ['JAN', 'FEB', 'MAR', 'APR',
    'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC'])

Then, almost all processing can be performed in the following instruction:

res = df.query('Name == "Bob"').drop(columns=['Name'])\
    .set_index('Month').stack().reset_index(name='val')\
    .groupby(['Month', 'level_1', 'val']).size()\
    .sort_index(ascending=[True, False, True]).droplevel(1)

This instruction is quite long (involves multiple steps), so to understand it run each step separately and see the intermediate results.

I wrote "almost all" before, because for now the MultiIndex has unnecessary column names. To clear them, run:

res.index.names = ['', '']

And the final result is:

APR  CA    1
     A     1
MAY  CA    2
     NV    1
     C     1
     D     2
JUN  CA    1
     D     1
dtype: int64

Note that data for JUN is after MAY, just as it should be.

I tried this code but I couldn't get past an error related to the expected index. Thanks anyway. I will keep the code in my back pocket to maybe use for something else. — Marty, Sep 29 '19 at 17:12

score 0 · Answer 2 · answered Sep 29 '19 at 17:09

I was hoping to get one table but I was able to get close enough using groupby and two .count() methods. I figured out I needed to filter the dataframe first to only the value I wanted (Bob). This may seem obvious but I am new. Thanks for everyone's help.

df1 = df[df['Name'] == 'Bob']
grp1 = df1.groupby(['Month', 'Job Code']) 
grp1.count()

grp2 = df1.groupby(['Month', 'State']) 
grp2.count()

Python/Pandas, count a specific values based on multiple values

2 Answers2