Determine counts for each column by label

Question

I have a very large (308801, 256) dataframe I am working with. In the dataframe, there is a column, ON_TIME, which holds the values 1 or 0 (yes, we made the delivery on time, or no, we did not).

I would like to, for each column, count the amount of times each value was a 1, or a 0. An example dataset looks like this:

| Delivery_Type | Delivery_Driver | ON_TIME |
|:-------------:|:---------------:|:-------:|
|       A       |    Wundermahn   |    1    |
|       B       |    Wundermahn   |    0    |
|       B       |    Wundermahn   |    0    |
|       A       |    Jon Skeet    |    1    |
|       C       |    Jon Skeet    |    1    |
|       A       |    Wundermahn   |    0    |

I want a dataset, for each column, that looks like this:

| Delivery_Type | ON_TIME_1 | ON_TIME_0 |
|:-------------:|:---------:|:---------:|
|       A       |     2     |     1     |
|       B       |     0     |     2     |
|       C       |     1     |     0     |

I know in SQL, I could do something like:

SELECT
    DELIVERY_TYPE,
    SUM(CASE WHEN ON_TIME = 1 THEN 1 ELSE 0 END AS ON_TIME_1) AS ON_TIME_1,
    SUM(CASE WHEN ON_TIME = 0 THEN 1 ELSE 0 END AS ON_TIME_0) AS ON_TIME_0
FROM
    dataframe
GROUP BY
    DELIVERY_TYPE

But how can I do this in Python? I have tried:

for col in df:
    temp = df[col].groupby('ON_TIME')
    print(temp)

But that is not working. How can I achieve the desired result in pandas?

Does this answer your question? [How to pivot a dataframe](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) — G. Anderson, Mar 31 '20 at 17:48
Also this [link](https://stackoverflow.com/questions/45752601/how-to-do-a-conditional-count-after-groupby-on-a-pandas-dataframe) seems similar — pc_pyr, Mar 31 '20 at 17:52

score 2 · Accepted Answer · answered Mar 31 '20 at 17:48

Use pd.crosstab with DataFrame.add_prefix:

df_count = (pd.crosstab(df['Delivery_Type'], df['ON_TIME'], colnames=[None])
              .add_prefix('ON_TIME_')
              .reset_index())
print(df_count)

  Delivery_Type  ON_TIME_0  ON_TIME_1
0             A          1          2
1             B          2          0
2             C          0          1

Determine counts for each column by label

1 Answers1