2

I'd like to show on the same graph a bar chart of a dataframe, and a line chart that represents the sum. I can do that for a frame for which the index is numeric or text. But it doesn't work for a datetime index. Here is the code I use:

import datetime as dt
np.random.seed(1234)
data = np.random.randn(10, 2)
date = dt.datetime.today()
index_nums =  range(10)
index_text = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'k']
index_date = pd.date_range(date + dt.timedelta(days=-9), date)
a_nums = pd.DataFrame(columns=['a', 'b'], index=index_nums, data=data)   
a_text = pd.DataFrame(columns=['a', 'b'], index=index_text, data=data)
a_date = pd.DataFrame(columns=['a', 'b'], index=index_date, data=data)

fig, ax = plt.subplots(3, 1)
ax = ax.ravel()
for i, a in enumerate([a_nums, a_text, a_date]):
    a.plot.bar(stacked=True, ax=ax[i])
    (a.sum(axis=1)).plot(c='k', ax=ax[i])

enter image description here

As you can see the last chart comes only as the line with the bar chart legend. And the dates are missing.

Also if I replace the last line with

ax[i].plot(a.sum(axis=1), c='k')

Then:

  • The chart with index_nums is the same
  • The chart with index_text raises an error
  • the chart with index_date shows the bar chart but not the line chart.

fgo I'm using pytho 3.6.2 pandas 0.20.3 and matplotlib 2.0.2

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
Tanguy Bretagne
  • 440
  • 5
  • 15

1 Answers1

2

Plotting a bar plot and a line plot to the same axes may often be problematic, because a bar plot puts the bars at integer positions (0,1,2,...N-1) while a line plot uses the numeric data to determine the ordinates.

In the case from the question, using range(10) as index for both bar and line plot works fine, since those are exactly the numbers a bar plot would use anyways. Using text also works fine, since this needs to be replaced by numbers in order to show it and of course the first N integers are used for that.

The bar plot for a datetime index also uses the first N integers, while the line plot will plot on the dates. Hence depending on which one comes first, you only see the line or bar plot (you would actually see the other by changing the xlimits accordingly).

An easy solution is to plot the bar plot first and reset the index to a numeric one on the dataframe for the line plot.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(1234)
import datetime as dt

data = np.random.randn(10, 2)
date = dt.datetime.today()

index_date = pd.date_range(date + dt.timedelta(days=-9), date)
df = pd.DataFrame(columns=['a', 'b'], index=index_date, data=data)

fig, ax = plt.subplots(1, 1)

df.plot.bar(stacked=True, ax=ax)
df.sum(axis=1).reset_index().plot(ax=ax)

fig.autofmt_xdate()  
plt.show()

Alternatively you can plot the lineplot as usual and use a matplotlib bar plot, which accepts numeric positions. See this answer: Python making combined bar and line plot with secondary y-axis

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • Thanks very much. It is a bit counter-intuitive sometimes to do different types of plots. I was surprised that the second method `ax[i].plot(a.sum(axis=1), c='k')` actually raises an error for a text index. – Tanguy Bretagne Oct 09 '17 at 13:10
  • matplotlib `plot` would require arguments x and y, `ax.plot(range(len(df)), df.sum(axis=1))` or a single array as y argument, `ax.plot(df.sum(axis=1).values)`. – ImportanceOfBeingErnest Oct 09 '17 at 13:17