Time series data that needs to be sampled to every 15 minutes and plotted

Question

I have a data frame that looks like this:

                 counts month
login_time      
1970-03-14 17:45:52 3   Mar
1970-01-09 01:31:25 3   Jan
1970-04-12 04:03:15 3   Apr
1970-02-24 23:09:57 3   Feb
1970-04-04 01:17:40 3   Apr
1970-02-12 11:16:53 3   Feb
1970-03-17 01:01:39 3   Mar
1970-01-06 21:45:52 3   Jan
1970-03-29 03:24:57 3   Mar
1970-04-03 14:42:38 2   Apr

I would like to aggregate these login counts by 15 min intervals and then plot the results.

I tried the following:

df.groupby('login_time').resample('15min').count()

but the way it resamples doesn't seem correct

        counts  month
login_time  login_time      
1970-01-01 20:12:16 1970-01-01 20:00:00 1   1
1970-01-01 20:13:18 1970-01-01 20:00:00 1   1
1970-01-01 20:16:10 1970-01-01 20:15:00 1   1
1970-01-01 20:16:36 1970-01-01 20:15:00 1   1
1970-01-01 20:16:37 1970-01-01 20:15:00 1   1
1970-01-01 20:21:41 1970-01-01 20:15:00 1   1
1970-01-01 20:26:05 1970-01-01 20:15:00 1   1
1970-01-01 20:26:21 1970-01-01 20:15:00 1   1
1970-01-01 20:31:03 1970-01-01 20:30:00 1   1
1970-01-01 20:34:46 1970-01-01 20:30:00 1   1

Thank you!

By 15 min intervals, do you mean you would like to bin by intervals of 15 min, beginning with midnight? — liorr, May 31 '20 at 01:27

liorr · Answer 1 · 2020-05-31T02:06:28.320

Not sure if that's exactly what you meant, since you did not specify if you're interested in bins of 15 min from midnight or from the beginning of the dataset, but here's something that I think would work:

I generated random dates in some range (to have something to bin) using that answer.

import pandas as pd
import numpy as np

# Make some fake data
def random_date_generator(start_date, range_in_days):
    days_to_add = np.arange(0, range_in_days)
    random_date = np.datetime64(start_date) + np.random.choice(days_to_add)
    return random_date

data_length = 1000
date_col = [random_date_generator('1970-01-01 00:00:00', 100000) for dc in np.arange(data_length)]
count_col = np.random.randint(5, size = data_length)

# Sample:
df = pd.DataFrame({'login_time':date_col, 'counts': count_col})
df = df.set_index(['login_time'])

df.resample('15T').count()

Time series data that needs to be sampled to every 15 minutes and plotted

1 Answers1