Frequency of events in a week

Question

My data has trips with datetime info, user id for each trip and trip type (single, round, pseudo).

Here's a data sample (pandas dataframe), named All_Data:

HoraDTRetirada        idpass        type                                             
2016-02-17 15:36:00  39579449489   'single'  
2016-02-18 19:13:00  39579449489   'single' 
2016-02-26 09:20:00  72986744521   'pseudo' 
2016-02-27 12:11:00  72986744521   'round'  
2016-02-27 14:55:00  11533148958   'pseudo'
2016-02-28 12:27:00  72986744521   'round'
2016-02-28 16:32:00  72986744521   'round'

I would like to count the number of times each category repeats in a "week of year" by user.

For example, if the event happens on a monday and the next event happens on a thursday for a same user, that makes two events on the same week; however, if one event happens on a saturday and the next event happens on the following monday, they happened in different weeks.

The output I am looking for would be in a form like this:

idpass        weekofyear   type      frequency
39579449489    1           'single'   2
72986744521    2           'round'    3
72986744521    2           'pseudo'   1
11533148958    2           'pseudo'   1

Edit: this older question approaches a similar problem, but I don't know how to do it with pandas.

The last example in the df has 30th feb. That might give you an error if you standard libraries for date conversion. — Himaprasoon, Jul 05 '17 at 11:01
Thanks, I fixed it. I had quickly modified it for example purposes and did not pay attention to the month. — Helk, Jul 05 '17 at 11:15

score 1 · Answer 1 · answered Jul 05 '17 at 13:09

import pandas as pd

data = {"HoraDTRetirada": ["2016-02-17 15:36:00", "2016-02-18 19:13:00", "2016-12-31 09:20:00", "2016-02-28 12:11:00",
                           "2016-02-28 14:55:00", "2016-02-29 12:27:00", "2016-02-29 16:32:00"],
        "idpass": ["39579449489", "39579449489", "72986744521", "72986744521", "11533148958", "72986744521",
                   "72986744521"],
        "type": ["single", "single", "pseudo", "round", "pseudo", "round", "round"]}
df = pd.DataFrame.from_dict(data)
print(df)
df["HoraDTRetirada"] = pd.to_datetime(df['HoraDTRetirada'])
df["week"] = df['HoraDTRetirada'].dt.strftime('%U')
k = df.groupby(["idpass", "week", "type"],as_index=False).count()
print(k)

Output:

        HoraDTRetirada       idpass    type
0  2016-02-17 15:36:00  39579449489  single
1  2016-02-18 19:13:00  39579449489  single
2  2016-12-31 09:20:00  72986744521  pseudo
3  2016-02-28 12:11:00  72986744521   round
4  2016-02-28 14:55:00  11533148958  pseudo
5  2016-02-29 12:27:00  72986744521   round
6  2016-02-29 16:32:00  72986744521   round
        idpass week    type  HoraDTRetirada
0  11533148958   09  pseudo               1
1  39579449489   07  single               2
2  72986744521   09   round               3
3  72986744521   52  pseudo               1

my original dataset has many more columns than the ones listed in the sample, so this method ends up printing out all the other columns that I don't want. The week numbering is just what I wanted though. Thanks for that. — Helk, Jul 06 '17 at 13:52
I got the desired output by just doing: `All_Data.groupby(['idpass','type','week']).size()'` instead of your last line. — Helk, Jul 06 '17 at 14:09

Helk · Accepted Answer · 2017-07-08T14:24:30.970

1

This is how I got what I was looking for:

Step 1 from suggested answers was skipped because timestamps were already in pandas datetime form.

Step 2: create column for week of year:

df['week'] = df['HoraDTRetirada'].dt.strftime('%U')

Step 3: group by user id, type and week, and count values with size()

df.groupby(['idpass','type','week']).size()

edited Jul 08 '17 at 14:24

answered Jul 06 '17 at 14:48

Helk

121
10

Kacper Wolkowski · Answer 3 · 2017-07-06T14:39:38.930

My suggestion would be to do this:

make sure your timestamp is pandas datetime and add frequency column

df['HoraDTRetirada'] = pd.to_datetime(df['HoraDTRetirada'])

df['freq'] = 1
Group it and count

res = df.groupby(['idpass', 'type', pd.Grouper(key='HoraDTRetirada', freq='1W')]).count().reset_index()
Convert time to week of a year

res['HoraDTRetirada'] = res['HoraDTRetirada'].apply(lambda x: x.week)

Final result looks like that:

EDIT:

You are right, in your case we should do step 3 before step 2, and if you want to do that, remember that groupby will change, so finally step 2 will be:

res['HoraDTRetirada'] = res['HoraDTRetirada'].apply(lambda x: x.week)

and step 3 :

res = df.groupby(['idpass', 'type', 'HoraDTRetirada')]).count().reset_index()

It's a bit different because the "Hora" variable is not a time anymore, but just an int representing a week.

i wanted to group by week, so I think in your model, the step 3 should come before the step 2, grouping by week instead of datetime. I tried doing that but I got the following error: `TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'` — Helk, Jul 06 '17 at 13:50

Frequency of events in a week

3 Answers3