39

I am new to matplotlib (1.3.1-2) and I cannot find a decent place to start. I want to plot the distribution of points over time in a histogram with matplotlib.

Basically I want to plot the cumulative sum of the occurrence of a date.

date
2011-12-13
2011-12-13
2013-11-01
2013-11-01
2013-06-04
2013-06-04
2014-01-01
...

That would make

2011-12-13 -> 2 times
2013-11-01 -> 3 times
2013-06-04 -> 2 times
2014-01-01 -> once

Since there will be many points over many years, I want to set the start date on my x-Axis and the end date, and then mark n-time steps(i.e. 1 year steps) and finally decide how many bins there will be.

How would I achieve that?

four-eyes
  • 10,740
  • 29
  • 111
  • 220

3 Answers3

49

Matplotlib uses its own format for dates/times, but also provides simple functions to convert which are provided in the dates module. It also provides various Locators and Formatters that take care of placing the ticks on the axis and formatting the corresponding labels. This should get you started:

import random
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

# generate some random data (approximately over 5 years)
data = [float(random.randint(1271517521, 1429197513)) for _ in range(1000)]

# convert the epoch format to matplotlib date format 
mpl_data = mdates.epoch2num(data)

# plot it
fig, ax = plt.subplots(1,1)
ax.hist(mpl_data, bins=50, color='lightblue')
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%d.%m.%y'))
plt.show()

Result:

enter image description here

hitzg
  • 12,133
  • 52
  • 54
15

To add to hitzg's answer, you can use AutoDateLocator and AutoDateFormatter to have matplotlib do the location and formatting for you:

locator = mdates.AutoDateLocator()
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(mdates.AutoDateFormatter(locator))

enter image description here

Will Vousden
  • 32,488
  • 9
  • 84
  • 95
2

Here is a more modern solution for matplotlib version 3.5.3.

Also, it explicitly specifies the min/max date instead of relying on min/max values derived from the data.

import random
from datetime import datetime, timedelta
import matplotlib.pyplot as plt

days = 365*3
start_date = datetime.now()
random_dates = [
    start_date + timedelta(days=int(random.random()*days))
    for _ in range(100)
]
end_date = start_date + timedelta(days=days)

fig, ax = plt.subplots(figsize=(5,3))
n, bins, patches = ax.hist(random_dates, bins=52, range=(start_date, end_date))
fig.autofmt_xdate()
plt.show()

histogram with time on x-axis

BushMinusZero
  • 1,202
  • 16
  • 21