1

I have a dataframe in this format:

    DATE        NAME        ARRIVAL TIME
275 2018-07-05  Adam    19:33:51.579885
276 2018-07-05  Bill    19:38:57.578135
277 2018-07-05  Cindy   19:40:24.704381
278 2018-07-05  Don     19:34:29.689414
279 2018-07-05  Eric    19:33:54.173609

I would like to plot a histogram of arrival times in fixed buckets, e.g. every 10 minutes.

Utilising the follow code from other answers, I've managed to produce the following histogram:

df['ARRIVAL TIME'] = pd.to_datetime(df['ARRIVAL TIME'])
plt.hist([t.hour + t.minute/60. for t in df['ARRIVAL TIME']], bins = 8)

enter image description here

That's close to what I want. However, I'd prefer the bins to be "7:30", "7:40", etc.

Plato's Cave
  • 99
  • 1
  • 1
  • 10

1 Answers1

2

If you just want to alter the default tick labels manually (see e.g., this answer) the following should work (after running the commands you already have done):

plt.draw()      # do this so that the labels are generated
ax = plt.gca()  # get the figure axes
xticks = ax.get_xticklabels()  # get the current x-tick labels
newlabels = []
for label in xticks:
    h, m = divmod(float(label.get_text())%12, 1)  # get hours and minutes (in 12 hour clock)
    newlabels.append('{0:02d}:{1:02d}'.format(int(h), int(m*60)))  # create the new label

ax.set_xticklabels(newlabels)  # set the new labels

But, if you want to specifically set the histogram bins edges to be on be in 10 minutes intervals then you can do the following:

import numpy as np

# get a list of the times
times = [t.hour + t.minute/60. for t in df['ARRIVAL TIME']]

# set the time interval required (in minutes)
tinterval = 10.

# find the lower and upper bin edges (on an integer number of 10 mins past the hour)
lowbin = np.min(times) - np.fmod(np.min(times)-np.floor(np.min(times)), tinterval/60.)
highbin = np.max(times) - np.fmod(np.max(times)-np.ceil(np.max(times)), tinterval/60.)
bins = np.arange(lowbin, highbin, tinterval/60.)  # set the bin edges

# create the histogram
plt.hist(times, bins=bins)
ax = plt.gca()  # get the current plot axes
ax.set_xticks(bins)  # set the position of the ticks to the histogram bin edges

# create new labels in hh:mm format (in twelve hour clock)
newlabels = []
for edge in bins:
    h, m = divmod(edge%12, 1)  # get hours and minutes (in 12 hour clock)
    newlabels.append('{0:01d}:{1:02d}'.format(int(h), int(m*60)))  # create the new label

ax.set_xticklabels(newlabels)  # set the new labels
Matt Pitkin
  • 3,989
  • 1
  • 18
  • 32
  • In this particular case, this code makes the tick labels "7:36", "7:48", and so on (though it's different for an alternative filtering of the dataframe). I would like the bins to correspond to e.g. 7:30pm-7:40pm, 7:40pm-7:50pm, etc. – Plato's Cave Dec 05 '18 at 20:49
  • @Plato'sCave I've updated the answer to hopefully more fully answer your question. – Matt Pitkin Dec 05 '18 at 22:04