2

I'm trying to count a frequency of 2 events by the month using 2 columns from my df. What I have done so far has counted all events by the unique time which is not efficient enough as there are too many results. I wish to create a graph with the results afterwards.

I've tried adapting my code by the answers on the SO questions:

but can not seem to get the command working when I input freq='day' within the groupby command.

My code is:

print(df.groupby(['Priority', 'Create Time']).Priority.count())

which initially produced something like 170000 results in the structure of the following:

Priority  Create Time        
1.0       2011-01-01 00:00:00    1
          2011-01-01 00:01:11    1
          2011-01-01 00:02:10    1
                  ...

2.0       2011-01-01 00:01:25    1
          2011-01-01 00:01:35    1
                  ...

But now for some reason (I'm using Jupyter Notebook) it only produces:

Priority  Create Time        
1.0       2011-01-01 00:00:00    1
          2011-01-01 00:01:11    1
          2011-01-01 00:02:10    1
2.0       2011-01-01 00:01:25    1
          2011-01-01 00:01:35    1
Name: Priority, dtype: int64

No idea why the output has changed to only 5 results (maybe I unknowingly changed something).

I would like the results to be in the following format:

Priority  month     Count     
1.0       2011-01     a
          2011-02     b
          2011-03     c
                ...

2.0       2011-01     x
          2011-02     y
          2011-03     z
                ...

Top points for showing how to change the frequency correctly for other values as well, for example hour/day/month/year. With the answers please could you explain what is going on in your code as I am new and learning pandas and wish to understand the process. Thank you.

1 Answers1

2

One possible solution is convert datetime column to months periods by Series.dt.to_period:

print(df.groupby(['Priority', df['Create Time'].dt.to_period('m')]).Priority.count())

Or use Grouper:

print(df.groupby(['Priority', pd.Grouper(key='Create Time', freq='MS')]).Priority.count())

Sample:

np.random.seed(123)

df = pd.DataFrame({'Create Time':pd.date_range('2019-01-01', freq='10D', periods=10),
                   'Priority':np.random.choice([0,1], size=10)})

print (df)
  Create Time  Priority
0  2019-01-01         0
1  2019-01-11         1
2  2019-01-21         0
3  2019-01-31         0
4  2019-02-10         0
5  2019-02-20         0
6  2019-03-02         0
7  2019-03-12         1
8  2019-03-22         1
9  2019-04-01         0

print(df.groupby(['Priority', df['Create Time'].dt.to_period('m')]).Priority.count())
Priority  Create Time
0         2019-01        3
          2019-02        2
          2019-03        1
          2019-04        1
1         2019-01        1
          2019-03        2
Name: Priority, dtype: int64

print(df.groupby(['Priority', pd.Grouper(key='Create Time', freq='MS')]).Priority.count())
Priority  Create Time
0         2019-01-01     3
          2019-02-01     2
          2019-03-01     1
          2019-04-01     1
1         2019-01-01     1
          2019-03-01     2
Name: Priority, dtype: int64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • For the first I'm getting `AttributeError: 'RangeIndex' object has no attribute 'to_period'` andthe second only produces 2 results – pragmatic learner Oct 10 '19 at 10:37
  • Ok - first one works with `.dt.to_period('m')`. I'm still only getting 2 results though - I'm expecting a lot more. Do you know how to counter this? – pragmatic learner Oct 10 '19 at 10:40
  • @pragmaticlearner - hmmm, are datetimes like you expected? Because this seems some data related problem. – jezrael Oct 10 '19 at 10:41
  • I think so...my command when importing the csv file is `df = pd.read_csv('records-for-2011.csv', parse_dates=['Create Time'])`. It was working before with like 170000+ rows however when I started adapting it to get what I wanted it changed somehow. No idea why – pragmatic learner Oct 10 '19 at 10:44
  • @pragmaticlearner - Code seems good, is possible check if datetimes have more months like 2? – jezrael Oct 10 '19 at 10:46
  • Yes there are 12 months in the data set. When I type df and run there are 180015 rows – pragmatic learner Oct 10 '19 at 10:49