1

I have a .csv output dictating the confirmed times every ticket in a day was created. I am trying to get total tickets confirmed for each person for each day. Here is sample data where the actual # of rows is in the 1000s:

Date_Created,Confirmed_By
4/6/2022 10:35,Bob
4/6/2022 11:39,Bob
4/6/2022 12:19,Tim
4/7/2022 7:08,Bob
4/7/2022 7:30,Sally
4/7/2022 7:35,Bob
4/8/2022 8:09,Sally
4/8/2022 8:28,Jack
4/8/2022 8:40,Jack
4/8/2022 10:00,Sally
4/8/2022 10:23,Jack

I would like to see a table with the first column as a list of dates and the subsequent columns being names with the total confirmations for each date associated:

Dates Bob Tim Jack Sally
2022-04-06 2 1 0 0
2022-04-07 2 0 0 1
2022-04-08 0 0 3 2

I tried pulling the csv into a dataframe and using value_counts() to create series for individual people, but I was never going to be 100% on the names as we would have random fillins that wouldn't be accounted for.

Bob = df[df["Confirmed By"].str.contains("Bob")]
Bob_Days = Bob["Date_Confirmed"].value_counts().rename("Bob")
conf_per_day = pd.concat(
    [Bob_Days, Tim_Days, Jack_Days, Sally_Days], axis=1
)

Then I tried a df.value_count().to_frame('counts').reset_index() which gave dates, names, and totals each on individual lines, but could not figure out how to spin those names into columns while not having repeating dates. Any ideas for totaling confirmations per person per day?

Sphynx
  • 33
  • 5
  • 1
    Does this answer your question? [How can I pivot a dataframe?](https://stackoverflow.com/questions/47152691/how-can-i-pivot-a-dataframe) – BeRT2me Jun 14 '22 at 22:11
  • Thank you @BeRT2me that looks to be extremely informative. I will definitely learn a lot from that. – Sphynx Jun 15 '22 at 02:11

3 Answers3

4

Use crosstab

>>> pd.crosstab(df['Date_Created'].dt.date, df['Confirmed_By'])

Confirmed_By  Bob  Jack  Sally  Tim
Date_Created                       
2022-04-06      2     0      0    1
2022-04-07      2     0      1    0
2022-04-08      0     3      2    0
rafaelc
  • 57,686
  • 15
  • 58
  • 82
1

I would do something like that:

import pandas as pd

df = pd.read_csv('your_test_data.csv')

# create date column
df['Date'] = pd.to_datetime(df['Date_Created']).dt.date

# pivot table with counting
result = df.pivot_table(index='Date', columns='Confirmed_By', aggfunc='count').fillna(0)

1

One possibile way is to use resample() and value_counts():

df['Date_Created'] = pd.to_datetime(df['Date_Created'])

df.resample('D',on = 'Date_Created')['Confirmed_By'].value_counts().unstack().fillna(0)

Output:

Confirmed_By  Bob  Jack  Sally  Tim
Date_Created                       
2022-04-06    2.0   0.0    0.0  1.0
2022-04-07    2.0   0.0    1.0  0.0
2022-04-08    0.0   3.0    2.0  0.0
rhug123
  • 7,893
  • 1
  • 9
  • 24