2

I'm obviously making a very basic mistake in adding a rolling mean plot to my figure.

The basic plot of close prices works fine, but as soon as I add the rolling mean to the plot, the x-axis dates get screwed up and I can't see what it's trying to do.

Here's the code:

import pandas as pd
import matplotlib.pyplot as plot

df = pd.read_csv('historical_price_data.csv')
df['Date'] = pd.to_datetime(df.Date, infer_datetime_format=True) 
df.sort_index(inplace=True)

ax = df[['Date', 'Close']].plot(figsize=(14, 7), x='Date', color='black')

rolling_mean = df.Close.rolling(window=7).mean()
plot.plot(rolling_mean, color='blue', label='Rolling Mean')

plot.show()

With this sample data set I am getting this figure:

Rolling Mean plot - broken x-axis

Given this simplicity of this code, I'm obviously making a very basic mistake, I just can't see what it is.

EDIT: Interesting, although @AndreyPortnoy's suggestion to set the index to Date results in the odd error that Date is not in the index, when I use the built-in's per his suggestion, the figure is no longer a complete mess, but for some reason the x-axis is reversed, and the ticks are no longer dates, but apparently ints (?) even though df.types shows Date is datetime64[ns]

enter image description here

@Sandipan\ Dey: Here's what the dataset looks like. Per code above I'm using pd.to_datetime() to convert to datetime64, and have tried df[::-1] to fix the problem where it is reversed when the 2nd plot (mov_avg) is added to the figure (but not reversed when figure only has the 1 plot.)

csv columns

Mr. T
  • 11,960
  • 10
  • 32
  • 54
Compustretch
  • 127
  • 2
  • 11
  • Could be a problem of your data structure. I suggest including a [Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) in your question. – Mr. T Sep 05 '18 at 06:11
  • can you share a few rows of your dataset? – Sandipan Dey Sep 05 '18 at 06:35
  • The data structure seems straight-forward, a .csv with the 'Date' column in ;2018-09-05' format and a float64 for the 'Close' price data. I am flummoxed as to why setting the index to Date fails like that. Since using the built-ins syntax fixes the initial problem, I would surmise it's syntactical, not data structure? – Compustretch Sep 05 '18 at 06:38
  • @Compustretch Please provide ~10 rows of your dataset. – Andrey Portnoy Sep 05 '18 at 06:39
  • 3
    The combination pandas-matplotlib-dates is never straight-forward: https://stackoverflow.com/a/44214830/8881141 – Mr. T Sep 05 '18 at 06:41
  • @Mr.\ T -- "In general the datetime utilities of pandas and matplotlib are incompatible." -- wait, what? – Compustretch Sep 05 '18 at 06:45
  • 1
    @Compustretch Please do not include data as pictures, always post it as text. There are gazillion different ways to import what you have posted leading to different data structures that behave differently in matplotlib/pandas. You could also post the link to a shorter version of your csv file. – Mr. T Sep 05 '18 at 06:47
  • P.S.: "Incompatible" in this context means that it depends highly on the data structure, the program has difficulties to guess, what the "right" way of presentation in a graph is. A general problem in Python with datetime: https://stackoverflow.com/a/21916253/8881141 – Mr. T Sep 05 '18 at 06:51
  • 1
    @Mr.T: I hadn't realized that, here's the csv as data paste: https://pastebin.com/XvDufSkT – Compustretch Sep 05 '18 at 06:52
  • @Compustretch CSV stands for 'comma-separated values', while your file is separated by blank spaces, which results in a single `Date Close` column. – Andrey Portnoy Sep 05 '18 at 07:04
  • @Mr.T Turns out the issue was that I copied and pasted instead of downloaded the file. – Andrey Portnoy Sep 05 '18 at 07:12
  • @Compustretch I cannot reproduce your issues, your sample dataset works fine for me. Could you run `pd.show_versions()` and tell us your `pandas` and `matplotlib` versions? – Andrey Portnoy Sep 05 '18 at 07:15
  • pandas: 0.23.0 & matplotlib: 2.2.2, tho in any case I think the error I get with your first suggestion, { df.set_index('Date', inplace=True) } is the most clueful. That one is baffling. – Compustretch Sep 05 '18 at 07:27

1 Answers1

2

The fact that your dates for the moving averages start at 1970 suggests that an integer range index is used. It was generated by default when you read in the csv file. Try inserting

df.set_index('Date', inplace=True)

before

df.sort_index(inplace=True)

Then you can do

ax = df['Close'].plot(figsize=(14, 7), color='black')
rolling_mean = df.Close.rolling(window=7).mean()
plot.plot(rolling_mean, color='blue', label='Rolling Mean')

Note that I'm not passing x explicitly, letting pandas and matplotlib infer it.

You can simplify your code by using the builtin plotting facilities like so:

df['mov_avg'] = df['Close'].rolling(window=7).mean()
df[['Close', 'mov_avg']].plot(figsize=(14, 7))
Andrey Portnoy
  • 1,430
  • 15
  • 24
  • When I try setting the index, I actually get " KeyError: " ['Date'] not in index " which seems confusing as that is what I'm trying to set. – Compustretch Sep 05 '18 at 06:20
  • 2
    Andrey can only guess your data structure. You can now go back and forth with two different data sets or you provide a reproducible sample data set, so you both work on the same problem. – Mr. T Sep 05 '18 at 06:37