Pandas: How to analyse data with start and end timestamp?

Question

I have to analyze the activity of users who uses an application during a given period, periods are start and end timestamp. I tried with a bar chart but I do not know how to include hours in interval. Ex : user with uid=2 use the application at [18, 19, 20, 21]

My dataframe is like:

uid           sex          start                 end
1             0       2000-01-28 16:47:00   2000-01-28 17:47:00
2             1       2000-01-28 18:07:00   2000-01-28 21:47:00
3             1       2000-01-28 18:47:00   2000-01-28 20:17:00
4             0       2000-01-28 08:00:00   2000-01-28 10:00:00
5             1       2000-01-28 02:05:00   2000-01-28 02:30:00
6             0       2000-01-28 15:10:00   2000-01-28 18:04:00
7             0       2000-01-28 01:50:00   2000-01-28 03:00:00


df['hour_s'] = pd.to_datetime(df['start']).apply(lambda x: x.hour)
df['hour_e'] = pd.to_datetime(df['end']).apply(lambda x: x.hour)

uid           sex          start                 end              hour_s      hour_e
1             0       2000-01-28 16:47:00   2000-01-28 17:47:00   16          17
2             1       2000-01-28 18:07:00   2000-01-28 21:47:00   18          21
3             1       2000-01-28 18:47:00   2000-01-28 20:17:00   18          20
4             0       2000-01-28 08:00:00   2000-01-28 10:00:00   08          10
5             1       2000-01-28 02:05:00   2000-01-28 02:30:00   02          02
6             0       2000-01-28 15:10:00   2000-01-28 18:04:00   15          18
7             0       2000-01-28 01:50:00   2000-01-28 03:00:00   01          03

I have to find number of users in a specifc hours

[This blog post](http://www.clowersresearch.com/main/gantt-charts-in-matplotlib/) gives a detailed example of what you want, please take a look — Vinícius Figueiredo, Jul 28 '17 at 01:47
And even better, [here](https://stackoverflow.com/questions/43367690/how-to-get-gantt-plot-using-matplotlib) — Vinícius Figueiredo, Jul 28 '17 at 01:53

score 1 · Accepted Answer · answered Jul 30 '17 at 01:00

I'm not sure whether you are looking for a Gantt Chart. If so, your hints by @Vinícius Aguiar, are in the comments.

From your last line

I have to find number of users in a specifc hours

It seems you need a histogram showing user amount (freqeuncy) pivoted by hour of day. If that is the case, you can do something like this:

#! /usr/bin/python3

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Read the data
df=pd.read_csv("data.csv")

# Get all hours per user (per observation)
def sum_hours(obs):
    return(list(range(obs['hour_s'],obs['hour_e']+1,1)))

# Get all existing activity hours (No matter which user)
Hours2D=list(df.apply(sum_hours,axis=1))
# Get all existing hours
HoursFlat=[hour for sublist in Hours2D for hour in sublist]

plt.hist(HoursFlat,rwidth=0.5,range=(0,24))
plt.xticks(np.arange(0,24, 1.0))
plt.xlabel('Hour of day')
plt.ylabel('Users')
plt.show()

Where data.csv is the sample you provided:

uid, sex,start,end,hour_s,hour_e
1,0,2000-01-28 16:47:00,2000-01-28 17:47:00,16,17
2,1,2000-01-28 18:07:00,2000-01-28 21:47:00,18,21
3,1,2000-01-28 18:47:00,2000-01-28 20:17:00,18,20
4,0,2000-01-28 08:00:00,2000-01-28 10:00:00,08,10
5,1,2000-01-28 02:05:00,2000-01-28 02:30:00,02,02
6,0,2000-01-28 15:10:00,2000-01-28 18:04:00,15,18
7,0,2000-01-28 01:50:00,2000-01-28 03:00:00,01,03

You should get the following graph:

You can also take Hours2D and HoursFlat variables and run additional analysis, not just visualization. (Outliers, clustering by day-time etc.) — AChervony, Jul 30 '17 at 01:06
Note that this is only a directional example, and will only work if your observations are unique per user per day as they are in your sample dataset. — AChervony, Jul 30 '17 at 15:31

Pandas: How to analyse data with start and end timestamp?

1 Answers1

Linked