0

I have barplot and lineplots that share the same x axis that I want to plot together. Here's the picture:

enter image description here

I want the graph plot to keep the "average_daily_price" as y axis and disregard "num_sales" as y axis. Here's the result I want to achieve: enter image description here

I've tried the following

fig, ax1 = plt.subplots()
sns.lineplot(filtered_df, x='date', y='average_daily_price', ax=ax1)
sns.barplot(filtered_df, x="date", y="num_sales", alpha=0.5, ax=ax1)

But it gives weird result. I've also tried twinx() but couldn't make it work, besides it creates second y axis which I don't want.

Edit: running rafael's code results in this plot: enter image description here

I'd like to add that date is in a datetime64[ns] format.

Edit 2: This post has been closed for duplicate. I've already seen the posts in duplicate list and tried the solutions listed, but they do not apply to my case, I don't know why, that's what I'm trying to figure out by opening new question. I'm guessing it has to do with my x variable being a datetime object.

Ebrin
  • 179
  • 8

1 Answers1

3

The seaborn "barplot" is dedicated to plotting categorical variables. As such, it understands that each date is an unique value and plots the corresponding values sequentially. This breaks the behavior of the dates in the x-axis.

A workaround for this is to use matplotlibs ax.bar directly:

# imports
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl
import pandas as pd

# generate dummy data
rng = np.random.default_rng()
size=100
vals = rng.normal(loc=0.02,size=size).cumsum() + 50
drange = pd.date_range("2014-01", periods=size, freq="D")
num_sales = rng.binomial(size=size,n=50,p=0.4)

# store data in a pandas DF
df = pd.DataFrame({'date': drange,
                    'average_daily_price': vals,
                    'num_sales': num_sales})

# setup axes
fig, ax1 = plt.subplots(figsize=(12,3))
# double y-axis is necessary due to the difference in the range of both variables
ax2 = ax1.twinx()
# plot the number of sales as a series of vertical bars
ax2.bar(df['date'], df['num_sales'], color='grey', alpha=0.5, label='Number of sales')
# plot the price as a time-series line plot
sns.lineplot(data=df, x='date', y='average_daily_price', ax=ax1)

# format the x-axis ticks as dates in weekly intervals
# the format is datetime64[ns]
ax1.xaxis.set_major_locator(mpl.dates.WeekdayLocator(interval=1, byweekday=1))  #weekly
ax1.xaxis.set_major_formatter(mpl.dates.DateFormatter('%Y-%m-%d'))
# rotate the x-axis tick labels for readability
ax1.tick_params(axis='x', rotation=50)

and the output is output from code

RMS
  • 425
  • 4
  • 12
  • Rafael, thanks for the reply. I've run your code, please see the Edit. – Ebrin Oct 14 '22 at 18:22
  • Oh, it seems that the problem is in sns.barplot, which is not understanding that the x-variable is a date. It then defaults to the standard date starting from 1970. When it does this, the x range is so different that the other part of the plot is hidden. – RMS Oct 14 '22 at 18:58
  • I am trying to see if I can figure it out – RMS Oct 14 '22 at 18:58
  • please try the updated code. – RMS Oct 14 '22 at 19:34
  • Great answer @RafaelMenezes. Any reason you used pandas bar plot instead of seaborn? Is it not possible to use sns for both by just using the same `ax'? – Redox Oct 15 '22 at 07:11
  • 1
    Thanks, @Redox. Seaborn is a package that provides a nice interface for doing beautiful statiscs-related plots, but under the hood it uses matplotlib. Since in seaborn's `barplot` is meant for categorical variables, the x-axis gets transformed into a series of integers, basically. Probably it is possible to undo this transformation, but the best strategy in my opinion is using the matplotlib's `ax.bar` directly – RMS Oct 15 '22 at 10:18