1

I have a Dataframe df ,you can have it by running the following code:

import pandas as pd
from io import StringIO
    
df = """  
    month       status_review             supply_review  
2023-01-01           False                    False                   
2023-01-01           True                     True                  
2022-12-01           False                    True                  
2022-12-01           True                     True                
2022-12-01           False                    False   
    """
df= pd.read_csv(StringIO(df.strip()), sep='\s\s+', engine='python')

How can I count how many status_reviews and supply_review are True in each month?

The output should look like the following:

    month       # of true status_review      # of true supply_review  
2023-01-01           1                         1              
2022-12-01           1                         2
BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
William
  • 3,724
  • 9
  • 43
  • 76

3 Answers3

2

You can use .groupby() and .sum(), which is more concise than using .agg():

df.groupby("month").sum()

This outputs:

            status_review  supply_review
month
2022-12-01              1              2
2023-01-01              1              1
BrokenBenchmark
  • 18,126
  • 7
  • 21
  • 33
2

Assuming you have one date per month, like in this case, you can simply use groupby and sum all columns.

import pandas as pd
from io import StringIO
    
# setup sample data
df = """  
    month       status_review             supply_review  
2023-01-01           False                    False                   
2023-01-01           True                     True                  
2022-12-01           False                    True                  
2022-12-01           True                     True                
2022-12-01           False                    False   
    """
df= pd.read_csv(StringIO(df.strip()), sep='\s\s+', engine='python')

# sum all rows by month column
df.groupby('month').agg('sum')

If you have multiple dates per month, then you can convert the column to datetime type and then sum by month as follows:

df['month'] = pd.to_datetime(df['month'])
df.groupby([pd.Grouper(key='month', freq='M')]).agg('sum')
Harry Haller
  • 414
  • 4
  • 7
-1

use groupby and value_counts() with a reset_index to create a dataframe column for the counts. find the p-value by using the wilcoxon

data = """  
    month       status_review             supply_review  
2023-01-01           False                    False                   
2023-01-01           True                     True                  
2022-12-01           False                    True                  
2022-12-01           True                     True                
2022-12-01           False                    False   
    """
df= pd.read_csv(StringIO(data.strip()), sep='\s\s+', engine='python')
print(df)

results_status=df.groupby('month')['status_review'].value_counts().reset_index(name='n').sort_values('month')
results_supply=df.groupby('month')['supply_review'].value_counts().reset_index(name='n').sort_values('month')

results=pd.concat([results_status,results_supply],axis=1)
print(results)  

output

    month  status_review  n       month  supply_review  n
0  2022-12-01          False  2  2022-12-01           True  2
1  2022-12-01           True  1  2022-12-01          False  1
2  2023-01-01          False  1  2023-01-01          False  1
3  2023-01-01           True  1  2023-01-01           True  1
Golden Lion
  • 3,840
  • 2
  • 26
  • 35