1
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import matplotlib.dates as dts


def use_matplot():

    ax = df.plot(x='year', kind="area" )

    years = dts.YearLocator(20)
    ax.xaxis.set_major_locator(years)

    fig = ax.get_figure()
    fig.savefig('output.pdf')


dates = np.arange(1990,2061, 1)
dates = dates.astype('str').astype('datetime64')

df = pd.DataFrame(np.random.randint(0, dates.size, size=(dates.size,3)), columns=list('ABC'))
df['year'] = dates

cols = df.columns.tolist()
cols = [cols[-1]] + cols[:-1]
df = df[cols]

use_matplot()

In the above code, I get an error, "ValueError: year 0 is out of range" when trying to set the YearLocator so as to ensure the X-Axis has year labels for every 20th year. By default the plot has the years show up every 10 years. What am I doing wrong? Desired outcome is simply a plot with 1990, 2010, 2030, 2050 on the bottom. (Instead of default 1990, 2000, 2010, etc.)

bpdronkers
  • 109
  • 1
  • 3
  • 9
  • Sure, start by providing a [mcve]. I.e. create some dataframe in the code such that the code itself is runnable and reproduces your problem. Once that is done one may easily find a solution. – ImportanceOfBeingErnest Oct 12 '17 at 22:30
  • Thank you. I will see what I can do today. I guess it will be good practice to create a separate example. In the meantime if you know the method to call for changing an x-axis interval so instead of it reading 2000, 2010, 2020, ... 2060, it reads 2000, 2020, 2040, 2060. I.e. I just want less labels. Then please let me know :) – bpdronkers Oct 13 '17 at 18:26
  • I've updated my code to something thats minimal, complete and verifiable. Hope this helps. It still gives me an error though. If I remove the YearLocator then it works. – bpdronkers Oct 13 '17 at 22:26

2 Answers2

1

Since the years are simple numbers, you may opt for not using them as dates at all and keeping them as numbers.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dates = np.arange(1990,2061, 1)

df = pd.DataFrame(np.random.randint(0,dates.size,size=(dates.size,3)),columns=list('ABC'))
df['year'] = dates

cols = df.columns.tolist()
cols = [cols[-1]] + cols[:-1]
df = df[cols]

ax = df.plot(x='year', kind="area" )
ax.set_xticks(range(2000,2061,20))

plt.show()

enter image description here

Apart from that, using Matplotlib locators and formatters on date axes created via pandas will most often fail. This is due to pandas using a completely different datetime convention. In order to have more freedom for setting custom tickers for datetime axes, you may use matplotlib. A stackplot can be plotted with plt.stackplot. On such a matplotlib plot, the use of the usual matplotlib tickers is unproblematic.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dts

dates = np.arange(1990,2061, 1)

df = pd.DataFrame(np.random.randint(0,dates.size,size=(dates.size,3)),columns=list('ABC'))
df['year'] = pd.to_datetime(dates.astype(str)) 

cols = df.columns.tolist()
cols = [cols[-1]] + cols[:-1]
df = df[cols]

plt.stackplot(df["year"].values, df[list('ABC')].values.T)

years = dts.YearLocator(20)
plt.gca().xaxis.set_major_locator(years)

plt.margins(x=0)
plt.show()

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Thanks again. Question: why do we need to convert the years column in the dataframe to values in order for the "yearlocator" to work? If it's a datetime method shouldnt it work with datetime xaxis values? – bpdronkers Oct 16 '17 at 00:25
  • What do you mean by "convert to values"? As the answer says, "pandas is using a completely different datetime convention". So the only problem is that the pandas axis formatter cannot be easily adjusted to show a different interval than the default one. – ImportanceOfBeingErnest Oct 16 '17 at 09:20
  • So basically, it doesn't matter if the format of x-values are supplied as integers or as datetime64 types when fed into the stackplot method, when using YearLocator. – bpdronkers Oct 17 '17 at 17:39
  • No, the first solution uses numbers without a YearLocator, the second uses dates and a YearLocator. You will only be able to use a Date locator for dates. – ImportanceOfBeingErnest Oct 17 '17 at 18:02
  • Sorry - I thought the .values argument didn't pass on the datatype (only the integer/float values). But it does! Thanks again. – bpdronkers Oct 19 '17 at 23:16
  • The `.values` attribute is the DataFrame without column names or index. This is a numpy array and it has the same datatype as the dataframe columns, because internally DataFrames *are* mre or less extended numpy arrays. (Hence `.values` can indeed be an attribute and not a method that would convert something.) – ImportanceOfBeingErnest Oct 20 '17 at 07:55
0

Consider using set_xticklabels to specify values of x axis tick marks:

ax.set_xticklabels(sum([[i,''] for i in range(1990, 2060, 20)], []))
# [1990, '', 2010, '', 2030, '', 2050, '']

Image Output

Parfait
  • 104,375
  • 17
  • 94
  • 125