1

I have a very large (308801, 256) dataframe I am working with. In the dataframe, there is a column, ON_TIME, which holds the values 1 or 0 (yes, we made the delivery on time, or no, we did not).

I would like to, for each column, count the amount of times each value was a 1, or a 0. An example dataset looks like this:

| Delivery_Type | Delivery_Driver | ON_TIME |
|:-------------:|:---------------:|:-------:|
|       A       |    Wundermahn   |    1    |
|       B       |    Wundermahn   |    0    |
|       B       |    Wundermahn   |    0    |
|       A       |    Jon Skeet    |    1    |
|       C       |    Jon Skeet    |    1    |
|       A       |    Wundermahn   |    0    |

I want a dataset, for each column, that looks like this:

| Delivery_Type | ON_TIME_1 | ON_TIME_0 |
|:-------------:|:---------:|:---------:|
|       A       |     2     |     1     |
|       B       |     0     |     2     |
|       C       |     1     |     0     |

I know in SQL, I could do something like:

SELECT
    DELIVERY_TYPE,
    SUM(CASE WHEN ON_TIME = 1 THEN 1 ELSE 0 END AS ON_TIME_1) AS ON_TIME_1,
    SUM(CASE WHEN ON_TIME = 0 THEN 1 ELSE 0 END AS ON_TIME_0) AS ON_TIME_0
FROM
    dataframe
GROUP BY
    DELIVERY_TYPE

But how can I do this in Python? I have tried:

for col in df:
    temp = df[col].groupby('ON_TIME')
    print(temp)

But that is not working. How can I achieve the desired result in pandas?

artemis
  • 6,857
  • 11
  • 46
  • 99
  • 1
    Does this answer your question? [How to pivot a dataframe](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) – G. Anderson Mar 31 '20 at 17:48
  • Also this [link](https://stackoverflow.com/questions/45752601/how-to-do-a-conditional-count-after-groupby-on-a-pandas-dataframe) seems similar – pc_pyr Mar 31 '20 at 17:52
  • I think it is not duplicated, here we must rename – ansev Mar 31 '20 at 17:57

1 Answers1

2

Use pd.crosstab with DataFrame.add_prefix:

df_count = (pd.crosstab(df['Delivery_Type'], df['ON_TIME'], colnames=[None])
              .add_prefix('ON_TIME_')
              .reset_index())
print(df_count)

  Delivery_Type  ON_TIME_0  ON_TIME_1
0             A          1          2
1             B          2          0
2             C          0          1
ansev
  • 30,322
  • 5
  • 17
  • 31