6

Despite trying some solutions available on SO and at Matplotlib's documentation, I'm still unable to disable Matplotlib's creation of weekend dates on the x-axis.

As you can see see below, it adds dates to the x-axis that are not in the original Pandas column.

enter image description here

I'm plotting my data using (commented lines are unsuccessful in achieving my goal):

fig, ax1 = plt.subplots()

x_axis = df.index.values
ax1.plot(x_axis, df['MP'], color='k')
ax2 = ax1.twinx()
ax2.plot(x_axis, df['R'], color='r')

# plt.xticks(np.arange(len(x_axis)), x_axis)
# fig.autofmt_xdate()
# ax1.fmt_xdata = mdates.DateFormatter('%Y-%m-%d')

fig.tight_layout()
plt.show()

An example of my Pandas dataframe is below, with dates as index:

2019-01-09  1.007042  2585.898714  4.052480e+09  19.980000  12.07     1
2019-01-10  1.007465  2581.828491  3.704500e+09  19.500000  19.74     1
2019-01-11  1.007154  2588.605258  3.434490e+09  18.190001  18.68     1
2019-01-14  1.008560  2582.151225  3.664450e+09  19.070000  14.27     1

Some suggestions I've found include a custom ticker here and here however although I don't get errors the plot is missing my second series.

Any suggestions on how to disable date interpolation in matplotlib?

pepe
  • 9,799
  • 25
  • 110
  • 188
  • 2
    I suspect you frame the problem in the wrong way. Plotting tools don't interpolate, but rather show a *linear* axis by default. Any day is hence part of the axis, similar to how the number `2` is necessarily part of an axis ranging from 1 to 4. A solution on how to plot an axis with dates left out is shown [in the matplotlib FAQ](https://matplotlib.org/faq/howto_faq.html#skip-dates-where-there-is-no-data). – ImportanceOfBeingErnest Jan 20 '19 at 21:10

3 Answers3

3

The matplotlib site recommends creating a custom formatter class. This class will contain logic that tells the axis label not to display anything if the date is a weekend. Here's an example using a dataframe I constructed from the 2018 data that was in the image you'd attached:

df = pd.DataFrame(
data = {
 "Col 1" : [1.000325, 1.000807, 1.001207, 1.000355, 1.001512, 1.003237, 1.000979],
 "MP": [2743.002071, 2754.011543, 2746.121450, 2760.169848, 2780.756857, 2793.953050, 2792.675162],
 "Col 3": [3.242650e+09, 3.453480e+09, 3.576350e+09, 3.641320e+09, 3.573970e+09, 3.573970e+09, 4.325970e+09], 
 "Col 4": [9.520000, 10.080000, 9.820000, 9.880000, 10.160000, 10.160000, 11.660000],
 "Col 5": [5.04, 5.62, 5.29, 6.58, 8.32, 9.57, 9.53],
 "R": [0,0,0,0,0,1,1]
}, 
index=['2018-01-08', '2018-01-09', '2018-01-10', '2018-01-11',
       '2018-01-12', '2018-01-15', '2018-01-16'])
  1. Move the dates from the index to their own column:
df = df.reset_index().rename({'index': 'Date'}, axis=1, copy=False)
df['Date'] = pd.to_datetime(df['Date'])
  1. Create the custom formatter class:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import Formatter
%config InlineBackend.figure_format = 'retina' # Get nicer looking graphs for retina displays

class CustomFormatter(Formatter):
    def __init__(self, dates, fmt='%Y-%m-%d'):
        self.dates = dates
        self.fmt = fmt

    def __call__(self, x, pos=0):
        'Return the label for time x at position pos'
        ind = int(np.round(x))
        if ind >= len(self.dates) or ind < 0:
            return ''

        return self.dates[ind].strftime(self.fmt)
  1. Now let's plot the MP and R series. Pay attention to the line where we call the custom formatter:
formatter = CustomFormatter(df['Date'])

fig, ax1 = plt.subplots()
ax1.xaxis.set_major_formatter(formatter)
ax1.plot(np.arange(len(df)), df['MP'], color='k')
ax2 = ax1.twinx()
ax2.plot(np.arange(len(df)), df['R'], color='r')
fig.autofmt_xdate()
fig.tight_layout()
plt.show()

The above code outputs this graph: Output graph

Now, no weekend dates, such as 2018-01-13, are displayed on the x-axis.

James Dellinger
  • 1,281
  • 8
  • 9
  • yep, this did the trick, thanks so much... I guess moving the 'Date' out of index and using your formatter were the crucial steps I needed – pepe Jan 26 '19 at 19:23
2

If you would like to simply not show the weekends, but for the graph to still scale correctly matplotlib has a built-in functionality for this in matplotlib.mdates. Specifically, the WeekdayLocator pretty much solves this problem singlehandedly. It's a one line solution (the rest just fabricates data for testing). Note that this works whether or not the data includes weekends:

import matplotlib.pyplot as plt
import datetime
import numpy as np
import matplotlib.dates as mdates
from matplotlib.dates import MO, TU, WE, TH, FR, SA, SU

DT_FORMAT="%Y-%m-%d"

if __name__ == "__main__":
    N = 14
    #Fake data
    x =  list(zip([2018]*N, [5]*N, list(range(1,N+1))))
    x = [datetime.datetime(*y) for y in x]
    x = [y for y in x if y.weekday() < 5]
    random_walk_steps = 2 * np.random.randint(0, 6, len(x)) - 3
    random_walk = np.cumsum(random_walk_steps)
    y = np.arange(len(x)) + random_walk

    # Make a figure and plot everything
    fig, ax = plt.subplots()
    ax.plot(x, y)

    ### HERE IS THE BIT THAT ANSWERS THE QUESTION
    ax.xaxis.set_major_locator(mdates.WeekdayLocator(byweekday=(MO, TU, WE, TH, FR)))
    ax.xaxis.set_major_formatter(mdates.DateFormatter(DT_FORMAT))

    # plot stuff
    fig.autofmt_xdate()
    plt.tight_layout()
    plt.show()

enter image description here

Him
  • 5,257
  • 3
  • 26
  • 83
  • thanks for chiming in @scott, I need the chart to be continuous, see solution below/above – pepe Jan 26 '19 at 19:25
1

If you are trying to avoid the fact that matplotlib is interpolating between each point of your dataset, you can exploit the fact that matplotlib will plot a new line segment each time a np.NaN is encountered. Pandas makes it easy to insert np.NaN for the days that are not in your dataset with pd.Dataframe.asfreq():

df = pd.DataFrame(data = {
    "Col 1" : [1.000325, 1.000807, 1.001207, 1.000355, 1.001512, 1.003237, 1.000979],
    "MP": [2743.002071, 2754.011543, 2746.121450, 2760.169848, 2780.756857, 2793.953050, 2792.675162],
    "Col 3": [3.242650e+09, 3.453480e+09, 3.576350e+09, 3.641320e+09, 3.573970e+09, 3.573970e+09, 4.325970e+09],
    "Col 4": [9.520000, 10.080000, 9.820000, 9.880000, 10.160000, 10.160000, 11.660000],
    "Col 5": [5.04, 5.62, 5.29, 6.58, 8.32, 9.57, 9.53],
    "R": [0,0,0,0,0,1,1]
    },
    index=['2018-01-08', '2018-01-09', '2018-01-10', '2018-01-11', '2018-01-12', '2018-01-15', '2018-01-16'])

df.index = pd.to_datetime(df.index)

#rescale R so I don't need to worry about twinax
df.loc[df.R==0, 'R'] = df.loc[df.R==0, 'R'] + df.MP.min()
df.loc[df.R==1, 'R'] = df.loc[df.R==1, 'R'] * df.MP.max()

df = df.asfreq('D')

df
               Col 1           MP         Col 3  Col 4  Col 5            R
2018-01-08  1.000325  2743.002071  3.242650e+09   9.52   5.04  2743.002071
2018-01-09  1.000807  2754.011543  3.453480e+09  10.08   5.62  2743.002071
2018-01-10  1.001207  2746.121450  3.576350e+09   9.82   5.29  2743.002071
2018-01-11  1.000355  2760.169848  3.641320e+09   9.88   6.58  2743.002071
2018-01-12  1.001512  2780.756857  3.573970e+09  10.16   8.32  2743.002071
2018-01-13       NaN          NaN           NaN    NaN    NaN          NaN
2018-01-14       NaN          NaN           NaN    NaN    NaN          NaN
2018-01-15  1.003237  2793.953050  3.573970e+09  10.16   9.57  2793.953050
2018-01-16  1.000979  2792.675162  4.325970e+09  11.66   9.53  2793.953050

df[['MP', 'R']].plot(); plt.show()

enter image description here

Kyle
  • 2,814
  • 2
  • 17
  • 30
  • thanks for chiming in @kyle, I need the chart to be continuous, see solution below/above – pepe Jan 26 '19 at 19:23