0

Using Python/Pandas and given a name-value (Bob), how do I value_count by job and state, and list by month. Initial df:

enter image description here

The output for Bob should be something like this::

enter image description here

I am working with many thousands of records and I need some way to retrieve the number of Jobs codes and states for a specific name for each month.

I can get this information individually using .loc and value_counts(), but not all at once. Grouby stats is not specific. Any ideas???

Marty
  • 53
  • 1
  • 9
  • Possible duplicate of [Get statistics for each group (such as count, mean, etc) using pandas GroupBy?](https://stackoverflow.com/questions/19384532/get-statistics-for-each-group-such-as-count-mean-etc-using-pandas-groupby) – ndclt Sep 28 '19 at 20:01
  • Dont post pictures of data/code – Erfan Sep 28 '19 at 20:30
  • Did not know about images. I won't do it again. As far as getting statistics for each group, the link did not discuss getting statistics for a specific value (name). If I do group statistics and groupby I get results for 400+ names. I am only interested in one name. – Marty Sep 28 '19 at 22:50
  • I added my eventual answer below. Thanks for pointing me in the right direction. – Marty Sep 29 '19 at 17:13

2 Answers2

1

I added one row (for June) to your test DataFrame:

   Name State Job Code Month
0   Bob    CA        A   APR
1   Joe    CA        B   APR
2  Mary    AZ        C   MAY
3   Bob    CA        D   MAY
4   Bob    NV        C   MAY
5   Bob    CA        D   MAY
6   Bob    CA        D   JUN

The reason is to show that the way I provided "calendar" sort on Month (instead of alphabetical) works OK.

To provide the mentioned "calendar" sort, start with:

df.Month = pd.Categorical(df.Month, ['JAN', 'FEB', 'MAR', 'APR',
    'MAY', 'JUN', 'JUL', 'AUG', 'SEP', 'OCT', 'NOV', 'DEC'])

Then, almost all processing can be performed in the following instruction:

res = df.query('Name == "Bob"').drop(columns=['Name'])\
    .set_index('Month').stack().reset_index(name='val')\
    .groupby(['Month', 'level_1', 'val']).size()\
    .sort_index(ascending=[True, False, True]).droplevel(1)

This instruction is quite long (involves multiple steps), so to understand it run each step separately and see the intermediate results.

I wrote "almost all" before, because for now the MultiIndex has unnecessary column names. To clear them, run:

res.index.names = ['', '']

And the final result is:

APR  CA    1
     A     1
MAY  CA    2
     NV    1
     C     1
     D     2
JUN  CA    1
     D     1
dtype: int64

Note that data for JUN is after MAY, just as it should be.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • I tried this code but I couldn't get past an error related to the expected index. Thanks anyway. I will keep the code in my back pocket to maybe use for something else. – Marty Sep 29 '19 at 17:12
0

I was hoping to get one table but I was able to get close enough using groupby and two .count() methods. I figured out I needed to filter the dataframe first to only the value I wanted (Bob). This may seem obvious but I am new. Thanks for everyone's help.

df1 = df[df['Name'] == 'Bob']
grp1 = df1.groupby(['Month', 'Job Code']) 
grp1.count()

grp2 = df1.groupby(['Month', 'State']) 
grp2.count()
Marty
  • 53
  • 1
  • 9