Seems to me like you just want a crosstab. It's more straightforward to calculate using pd.crosstab
than it is using a groupby/apply
approach. Though pd.crosstab
isn't the most efficient method of doing this, it works just fine.
This first snippet gets us the counts of "yes" and "no" for each team. We also get a "Total" column that is the sum of the "yes" + "no":
xtab = pd.crosstab(data["Team"], data["Timely"], margins=True, margins_name="Total")
print(xtab)
Timely No Yes Total
Team
Team 1 1 4 5
Team 2 2 2 4
Team 3 1 0 1
Total 4 6 10
Now let's calculate the percentage breakdown of "No" and "Yes" by dividing those both by our "Total" column:
norm_xtab = xtab[["No", "Yes"]].div(xtab["Total"], axis=0)
print(norm_xtab)
Timely No Yes
Team
Team 1 0.2 0.8
Team 2 0.5 0.5
Team 3 1.0 0.0
Total 0.4 0.6
Then we can combine those two dataframes horizontally with the pd.concat
function. We can also give names to each "subdataframe" e.g. one chunk of the dataframe will only be for count data, the other chunk will be for the normalized/percentage based data:
out = pd.concat([xtab, norm_xtab], keys=["count", "percent"], axis=1)
out = out.rename(columns={"No": "untimely", "Yes": "timely"}, level=1)
print(out)
count percent
Timely untimely timely Total untimely timely
Team
Team 1 1 4 5 0.2 0.8
Team 2 2 2 4 0.5 0.5
Team 3 1 0 1 1.0 0.0
Total 4 6 10 0.4 0.6
The result of the concatenation is a DataFrame with a multiindex as the columns. If you are unfamiliar with pd.MultiIndex, you can use this snippet to flatten your result into something you may be more familiar with:
out.columns = ["{}_{}".format(*colname) for colname in out.columns]
print(out)
count_untimely count_timely count_Total percent_untimely percent_timely
Team
Team 1 1 4 5 0.2 0.8
Team 2 2 2 4 0.5 0.5
Team 3 1 0 1 1.0 0.0
Total 4 6 10 0.4 0.6
To change the column names as in your comment below, you can simply run rename
one more time:
out = out.rename(columns=lambda colname: colname.replace("count_", ""))
print(out)
untimely timely Total percent_untimely percent_timely
Team
Team 1 1 4 5 0.2 0.8
Team 2 2 2 4 0.5 0.5
Team 3 1 0 1 1.0 0.0
Total 4 6 10 0.4 0.6
To convert the decimals to whole percentages, you'll need to multiply by 100, then either round to 0 decimal places OR use string formatting to trim the trailing decimals (I'll show you how to do both), and add another string formatting to get the "%" to appear.
percentages = (out
.filter(like="percent") # select columns that contain the string "percent"
.mul(100) # Multiply everything by 100
.round(0) # If you don't want to perform any rounding, get rid of this line
.applymap("{:.0f}%".format)) # Apply string formatting to trim the decimal.
print(percentages)
percent_untimely percent_timely
Team
Team 1 20% 80%
Team 2 50% 50%
Team 3 100% 0%
Total 40% 60%
If you want these values reflect back in our out
dataframe:
out.loc[:, percentages.columns] = percentages
print(out)
untimely timely Total percent_untimely percent_timely
Team
Team 1 1 4 5 20% 80%
Team 2 2 2 4 50% 50%
Team 3 1 0 1 100% 0%
Total 4 6 10 40% 60%