Group by ID and complet time series Pandas

Question

I have a pandas Dataframe with observations of one ID and I have a problem similar to the one solved here.

Timestamp              ID
2014-10-16 15:05:17    123
2014-10-16 14:56:37    148
2014-10-16 14:25:16    123
2014-10-16 14:15:32    123
2014-10-16 13:41:01    123
2014-10-16 12:50:30    148
2014-10-16 12:28:54    123
2014-10-16 12:26:56    123
2014-10-16 12:25:12    123
...
2014-10-08 15:52:49    150
2014-10-08 15:04:50    150
2014-10-08 15:03:48    148
2014-10-08 15:02:27    200
2014-10-08 15:01:56    236
2014-10-08 13:27:28    147
2014-10-08 13:01:08    148
2014-10-08 12:52:06    999
2014-10-08 12:43:27    999
Name: summary, Length: 600

On the mentioned post they show how to group by ID and also how to make the count.Using df['Week/Year'] = df['Timestamp'].apply(lambda x: "%d/%d" % (x.week, x.year)) I have now this:

   Timestamp               ID     Week/Year
0  2014-10-16 15:05:17     123    42/2014
1  2014-10-16 14:56:37     150    42/2014
2  2014-10-16 14:25:16     123    42/2014

My problem is that now I want to make a time series so, actually, I need:

Category    Week_42_2014    Week_43_2014    Week_44_2014    
123         7              0              6
150         0              0              2              ...

This is, I need the weeks as a column, the categories as rows and also fill the gaps of the weeks with no observations. In my case I also need days, but I guess that it is really similar.

Thanks,

This question has been marked as a duplicate because it was not clear my question. I don't want just to pivot the data, I also aim to fill the missing weeks, days without observations with 0 for all the IDs. I change the original explanation to make myself more clear. — Xbel, Oct 08 '18 at 07:52
I rolled back your edit as you shouldn't *change* your question in substance after you have answer(s). Instead, ask a new question with a **[mcve]** so that these problems don't occur. — jpp, Oct 08 '18 at 08:22
It was already in the text ("and also fill the gaps of the weeks with no observations"), I just make it more clear. So, in my opinion, the substance of the question is not changed. But ok, I'll try to be more clear next time. — Xbel, Oct 08 '18 at 08:40

score 3 · Accepted Answer · answered Oct 07 '18 at 18:10

You can use pd.pivot_table:

res = df.pivot_table(index='ID', columns='Week/Year', aggfunc='count', fill_value=0)

print(res)

          Timestamp        
Week/Year   41/2014 42/2014
ID                         
123               0       7
147               1       0
148               2       2
150               2       0
200               1       0
236               1       0
999               2       0

score 1 · Answer 2 · answered Oct 07 '18 at 18:07

You can use pd.crosstab in doing the task

df['date'] = pd.date_range(start='2014-10-16 15:05:17 ',end='2014-11-08 12:43:27 ',freq='D')
df['value'] = np.repeat([11,22,33,44],len(df)/3)[:len(df)]
df['week'] = df.date.dt.week
df['Year'] = df.date.dt.year



df = pd.crosstab(df.value,[df.week,df.Year])
df.columns = 'Week_' +df.columns.levels[0].astype(str)+"_" +df.columns.levels[1].astype(str)

Out:

    Week_42_2014    Week_43_2014    Week_44_2014    Week_45_2014
value               
11  4   3   0   0
22  0   4   3   0
33  0   0   4   3
44  0   0   0   2

Vishnu Kunchur · Answer 3 · 2018-10-07T18:12:40.447

Here's one way to do it using groupby and pivot:

df = df.groupby(['ID', pd.Grouper(key = 'Timestamp', freq = 'W')] ['ID'].count().\
        to_frame().rename(columns = {'ID' : 'counts'}).reset_index()

Then pivot:

df.pivot(index = 'ID', columns = 'Timestamp', values = 'counts')

This would ofcourse, return column names as the first day if the respective week; you can change these by replacing df.columns with a list of your desired column names.

EDIT:

If you already have a column of weeks, you can just use pd.pivot_table without doing a groupby.

Group by ID and complet time series Pandas

3 Answers3