4

Okay so I've been stuck here for the past 5 hours but I can't seem to do this combo graph correctly.

import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns


data = pd.read_csv('rating_conversion.csv')
df = pd.DataFrame(data)
overall_conversion_rate = df['overall_conversion_rate']
page_view_conversion = df['page_view_conversion']
Avg_Rating = df['avg_rating']
Total_Hired = df['total_hires']
df[:12]

fig, ax1 = plt.subplots(figsize=(10,6))
color = 'tab:green'
ax1.set_title('Total Hired and Avg Conversion % Per Rating Group', fontsize=16)
ax1.set_xlabel('Average_Rating', fontsize=16)
ax1.set_ylabel('Total_Hired', fontsize=16, color=color)
ax2 = sns.barplot(x=Avg_Rating, y=Total_Hired, data=df, palette='summer_r')
ax1.tick_params(axis='y')
ax2 = ax1.twinx()
color = 'tab:red'
ax2.set_ylabel('Avg Conversion %', fontsize=16, color=color)
ax2 = sns.lineplot(x=Avg_Rating, y=overall_conversion_rate, data=df, sort=False, color=color)
ax2.tick_params(axis='y', color=color)
plt.show()

This is what I keep on getting

My expectation is something like this where the Average conversion % and Total Hired share the same X axis.

enter image description here

Please help. Here's the code that I've been using as my example:

#Libraries
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns

#Data
#create list of months
Month = ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'June',
'July', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
#create list for made up average temperatures
Avg_Temp = [35, 45, 55, 65, 75, 85, 95, 100, 85, 65, 45, 35]
#create list for made up average precipitation %
Avg_Precipitation_Perc = [.90, .75, .55, .10, .35, .05, .05, .08, .20, .45, .65, .80]
#assign lists to a value
data = {'Month': Month, 'Avg_Temp': Avg_Temp, 'Avg_Precipitation _Perc': Avg_Precipitation_Perc}
#convert dictionary to a dataframe
df = pd.DataFrame(data)
#Print out all rows
df[:12]


#Create combo chart
fig, ax1 = plt.subplots(figsize=(10,6))
color = 'tab:green'
ax1.set_title('Average Precipitation Percentage by Month', fontsize=16)
ax1.set_xlabel('Month', fontsize=16)
ax1.set_ylabel('Avg Temp', fontsize=16, color=color)
ax2 = sns.barplot(x='Month', y='Avg_Temp', data = df, palette='summer')
ax1.tick_params(axis='y')
ax2 = ax1.twinx()
color = 'tab:red'
ax2.set_ylabel('Avg Precipitation %', fontsize=16, color=color)
ax2 = sns.lineplot(x='Month', y='Avg_Precipitation _Perc', data = df, sort=False, color=color)
ax2.tick_params(axis='y', color=color)
plt.show()

This is the content of rating_conversion.csv https://paste.ubuntu.com/p/8w63wP2z9J/

busybear
  • 10,194
  • 1
  • 25
  • 42
KwyjiboChris
  • 41
  • 1
  • 3

1 Answers1

7

The issue is that barplot doesn't plot at your actual x values. It treats the x variable as categorical so the locations are at x = 0, 1, 2, etc. lineplot, on the other hand, uses the actual x value. You'll notice in your figure that the line plot is between the 5th and 6th bars--corresponding to x locations between x = 4 and x = 5.

A workaround is to use pointplot instead of lineplot. pointplot treats the x variable as categorical, similar to barplot.

You can still use lineplot if you recalculate your x values so they start at 0 and go up by 1. An easy way to do this is to use the rank method. This would be easier than changing barplot to plot at the actual x values.

As an aside, you aren't quite using twinx properly. It works but you aren't utilizing ax2. I'll illustrate in the example.

The issue

import seaborn as sns
tips = sns.load_dataset("tips")

fig, ax = plt.subplots()
ax_twin = ax.twinx()
sns.barplot(x="day", y="total_bill", data=tips, ax=ax)
sns.lineplot(x='day', y='total_bill', data=tips, err_style='bars', ax=ax_twin)

enter image description here

Using pointplot:

fig, ax = plt.subplots()
ax_twin = ax.twinx()
sns.barplot(x="day", y="total_bill", data=tips, ax=ax)
sns.pointplot(x='day', y='total_bill', data=tips, ax=ax_twin)

enter image description here

Using lineplot and rank

tips['rank'] = tips['size'].rank(method='dense') - 1

fig, ax = plt.subplots()
ax_twin = ax.twinx()
sns.barplot(x='size', y='total_bill', data=tips, ax=ax)
sns.lineplot(x='rank', y='total_bill', data=tips, err_style='bars', ax=ax_twin)

enter image description here

busybear
  • 10,194
  • 1
  • 25
  • 42
  • Thank you so much. Using pointplot worked and fixed my issue. Anyway, is there a way for lineplot to actually use the x axis categories and not the actual x values? like getting the index or something? I'm very new to python and would like to know if there's a workaround to this – KwyjiboChris Oct 17 '20 at 13:13
  • You can replace all your `4.0`s to `0`, `4.2`s to `1`, etc. An easy way to do this is to use the [`rank`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rank.html) method. Set `method` to `'dense'`, and you'll need to subtract 1 since rank starts at 1. – busybear Oct 17 '20 at 13:24