I have data with a single column containing a date (Date
) and another column with categorical data (A
: Yes, No, Unknown).
I'd like to show the total percentage of "Yes" over time, but relative to the point in time of the observation (i.e., number of "Yes" / cumulative sum at that point in time).
Assume I have data like:
df
Date A
2022-08-22 Unknown
2022-08-23 Yes
2022-08-24 No
2022-08-25 Unknown
2022-09-13 Yes
# . . .
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 246 entries, 0 to 245
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 246 non-null datetime64[ns]
1 A 246 non-null object
dtypes: datetime64[ns](1), object(1)
The main question is: Is there a trend showing that A=="Yes" more frequently as time progresses by year/month?
I'd like to show, for each year/month, the percentage of Yes's compared to the total sum of all rows for that year/month. So if there are 10 records in June 2022, and 2 have A=="Yes", then the value for June 2022 is 20%. It might look kind of like this, where the value/A is here the percentage:
Date Date
2022 1 0.05
2 0.22
3 0.88
4 0.79
5 0.51
6 0.04
7 0.20
8 0.91
9 0.98
Name: A, dtype: int64
I can get a count of "Yes" by year and month like so:
df.loc[df["A"] == "Yes"]["A"].groupby([df["Date"].dt.year, df["Date"].dt.month]).agg("count")
But I'm not sure how to get relative percentage per month compared to the cumulative sum per year/month, which requires dividing A=="Yes" into total row count per year/month.
To have sample data:
d = [{'Date': Timestamp('2022-08-02 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-09-14 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-01-18 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-01-19 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-01-20 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-01-21 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-01-22 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-01-23 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-01-24 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-01-25 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-01-26 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-01-27 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-01-28 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-01-29 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-01-30 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-01-31 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-02-01 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-02-02 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-02-03 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-02-04 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-02-05 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-02-06 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-02-07 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-02-20 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-02-21 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-02-22 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-02-23 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-02-24 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-02-25 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-02-26 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-02-27 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-02-28 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-03-01 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-03-02 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-03-03 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-03-04 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-03-05 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-03-06 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-03-07 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-03-08 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-03-09 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-03-10 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-03-11 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-03-12 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-03-13 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-03-14 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-03-15 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-03-16 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-03-17 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-03-18 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-03-19 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-08-02 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-09-14 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-01-18 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-01-19 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-01-20 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-01-21 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-03-26 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-03-27 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-03-28 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-03-29 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-03-30 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-03-31 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-04-01 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-04-02 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-04-03 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-04-04 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-04-05 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-04-06 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-04-07 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-04-08 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-04-09 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-04-10 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-04-11 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-04-12 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-04-13 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-04-14 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-04-15 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-04-16 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-04-17 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-04-18 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-04-19 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-04-20 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-04-21 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-04-22 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-04-23 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-04-24 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-04-25 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-04-26 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-04-27 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-04-28 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-04-29 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-04-30 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-05-01 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-02 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-05-03 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-04 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-05 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-06 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-07 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-05-08 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-09 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-10 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-11 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-12 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-05-13 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-05-14 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-15 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-16 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-05-17 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-05-18 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-19 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-05-20 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-05-21 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-05-22 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-05-23 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-05-24 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-05-25 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-26 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-27 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-28 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-05-29 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-05-30 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-05-31 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-01 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-02 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-06-03 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-06-04 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-05 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-06 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-06-07 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-06-08 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-09 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-10 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-11 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-06-12 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-13 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-06-14 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-15 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-16 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-06-17 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-06-18 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-19 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-06-20 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-06-21 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-06-22 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-23 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-06-24 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-25 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-06-26 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-27 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-06-28 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-06-29 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-06-30 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-07-01 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-07-02 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-07-03 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-07-04 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-07-05 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-07-06 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-07-07 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-07-08 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-07-09 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-07-10 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-07-11 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-07-12 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-07-13 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-07-14 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-07-15 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-07-16 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-07-17 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-07-17 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-07-16 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-07-17 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-07-17 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-07-16 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-07-17 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-07-17 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-07-16 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-07-17 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-07-17 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-07-28 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-07-29 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-07-30 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-07-31 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-08-01 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-08-02 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-08-03 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-08-04 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-08-05 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-08-06 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-08-07 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-08-08 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-08-09 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-08-10 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-08-11 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-08-12 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-08-13 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-08-14 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-08-15 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-08-16 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-08-17 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-08-18 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-08-19 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-08-20 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-08-21 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-08-22 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-08-23 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-08-24 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-08-25 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-08-26 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-08-27 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-08-28 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-08-29 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-08-30 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-08-31 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-09-01 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-09-02 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-09-03 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-09-04 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-09-05 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-09-06 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-09-07 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-09-08 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-09-09 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-09-10 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-09-11 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-09-12 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-09-13 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-09-14 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-09-15 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-09-16 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-09-17 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-09-18 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-09-19 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-09-20 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-09-21 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-09-22 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-09-23 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-09-24 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-09-25 00:00:00'), 'A': 'Yes'},
{'Date': Timestamp('2022-09-26 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-09-27 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-09-28 00:00:00'), 'A': 'Unknown'},
{'Date': Timestamp('2022-09-29 00:00:00'), 'A': 'No'},
{'Date': Timestamp('2022-09-30 00:00:00'), 'A': 'No'}]
df = pd.DataFrame(d)