2

I have a DataFrame, you can have it by running:

import pandas as pd
from io import StringIO

df = """  

               case_id    scheduled_date        status_code
               1213       2021-08               success
               3444       2021-06               fail
               4566       2021-07               unknown
               12213      2021-08               unknown
               34344      2021-06               fail
               44566      2021-07               unknown
               1213       2021-08               fail
              
        """
df= pd.read_csv(StringIO(df.strip()), sep='\s\s+', engine='python')

This outputs:

  case_id   scheduled_date  status_code
0   1213    2021-08         success
1   3444    2021-06         fail
2   4566    2021-07         unknown
3   12213   2021-08         unknown
4   34344   2021-06         fail
5   44566   2021-07         unknown
6   1213    2021-08         fail

How can I count the number of success, fail, and unknown of each month?

Output should look like:

scheduled_date  num of success  num of fail  num of unknown

2021-08           1               1           1
2021-06           0               2           0
2021-07           0               0           2
BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
William
  • 3,724
  • 9
  • 43
  • 76
  • 3
    This calls for crosstab imo: `pd.crosstab(df['scheduled_date'], df['status_code'])`. Read [here](https://stackoverflow.com/a/73060100/19123103) for more info. – cottontail Jan 12 '23 at 20:45
  • 1
    Does this answer your question? [How can I pivot a dataframe?](https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe) – G. Anderson Jan 12 '23 at 20:48

2 Answers2

1

Here is a proposition with pandas.crosstab :

out = (
        pd.crosstab(df["scheduled_date"], df["status_code"])
            .rename_axis(None, axis=1)
            .add_prefix("num of ")
            .sort_index(ascending=False)
            .reset_index()
        )

# Output :

print(out)

  scheduled_date  num of fail  num of succuss  num of unknown
0        2021-08            1               1               1
1        2021-07            0               0               2
2        2021-06            2               0               0
Timeless
  • 22,580
  • 4
  • 12
  • 30
1

You can use .pivot_table() to create counts for each (month, status code) pair, and then use .fillna to replace NaNs with zero counts:

df.pivot_table(index="scheduled_date", columns="status_code", aggfunc=len).fillna(0)

This outputs:

               case_id
status_code       fail success unknown
scheduled_date
2021-06            2.0     0.0     0.0
2021-07            0.0     0.0     2.0
2021-08            1.0     1.0     1.0
BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
  • Thanks, it works!May I know ,how to select the fail value of a scheduled_date for exameple 2021-07 ? – William Jan 12 '23 at 21:01
  • Yes, you can use [`.loc`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html) for that. (You may also need `.reset_index()`.) – BrokenBenchmark Jan 12 '23 at 21:04