1

My goal is to create ratios from two filtered columns in a Pandas data frame, then use Plotly Express to create a bar chart using px.bar(). I'm able to do so using the base plot() function in Pandas, but not the px.bar() function in Plotly Express.

One problem that I ran into was that some of the columns contain duplicate values. This resulted in my having to do some Pandas gymnastics.

Here is my data:

test_df = pd.DataFrame({'Manufacturer':['Ford', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW'],
                          'Metric':['Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty'],
                          'Sector':['Germany', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA'],
                          'Value':[45000, 70000, 90000, 65000, 40000, 65000, 63000, 2700, 4400, 3400, 3000, 4700, 5700, 1500, 2000, 2500, 1300, 2000, 2450],
                          'City': ['Frankfurt', 'Bremen', 'Berlin', 'Hamburg', 'New York', 'Chicago', 'Los Angeles', 'Dresden', 'Munich', 'Cologne', 'Miami', 'Atlanta', 'Phoenix', 'Nuremberg', 'Dusseldorf', 'Leipzig', 'Houston', 'San Diego', 'San Francisco']
                       })

Due to some duplicate values, I create a temporary table:

temp_table = test_df.reset_index().pivot_table(values = 'Value', index = ['Manufacturer', 'Metric', 'Sector'], aggfunc='sum')

Then, reset the index:

df_new = temp_table.reset_index()

Then,

s1 = df_new.set_index(['Manufacturer','Sector']).query("Metric=='Orders'").Value
s2 = df_new.set_index(['Manufacturer','Sector']).query("Metric=='Sales'").Value

Then, unstack and plot:

temp_frame = s1.div(s2).unstack()
temp_frame.plot(kind='bar')

This works perfectly and produces the following bar plot using the standard Pandas plot() function:

enter image description here

Now, I attempt to plot using the px.bar() function in Plotly Express:

px.bar(temp_frame, x='Sector', y='Value', color='Exchange',
                            barmode='group',
                            text='Value',
                            title='Order to Sales Ratio)

This code results in the following error message:

ValueError: Value of 'x' is not the name of a column in 'data_frame'. Expected one of ['Germany', 'USA'] but received: Sector

This error looks related to the issue reported in Use Pandas index in Plotly Express. But, I think my data frame is not configured in such a way that I can implement the "ugly fix" solution suggested by @Laurens Koppenol and validated by @nicolaskruchten.

Can anyone help me resolve this error so that I can create the bar plot above using Plotly Express?

Thanks in advance!

equanimity
  • 2,371
  • 3
  • 29
  • 53

1 Answers1

2

Plotly version 4.8 allows us to do this!

https://community.plotly.com/t/announcing-plotly-py-4-8-plotly-express-support-for-wide-and-mixed-form-data-plus-a-pandas-backend/40048

According to the docs:

"...Plotly Express can now behave a bit more like the default Pandas plotting backend with respect to the x and y parameters of the corresponding functions. In light of this, we are taking advantage of the new pandas.options.plotting.backend option introduced in Pandas v0.25, and offering an official plotly backend for Pandas plotting. This means that you can import Pandas as you usually do, set the plotting backend to "plotly", and when you call df.plot(), Plotly Express is invoked, and a plotly.graph_objects.Figure object is returned, ready to be customized, rendered, or passed into Dash Core Components’ Graph component."

The code to do so is:

import pandas as pd
pd.options.plotting.backend='plotly'

test_df = pd.DataFrame({'Manufacturer':['Ford', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW'],
                          'Metric':['Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty'],
                          'Sector':['Germany', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA'],
                          'Value':[45000, 70000, 90000, 65000, 40000, 65000, 63000, 2700, 4400, 3400, 3000, 4700, 5700, 1500, 2000, 2500, 1300, 2000, 2450],
                          'City': ['Frankfurt', 'Bremen', 'Berlin', 'Hamburg', 'New York', 'Chicago', 'Los Angeles', 'Dresden', 'Munich', 'Cologne', 'Miami', 'Atlanta', 'Phoenix', 'Nuremberg', 'Dusseldorf', 'Leipzig', 'Houston', 'San Diego', 'San Francisco']
                       })


temp_table = test_df.reset_index().pivot_table(values = 'Value', index = ['Manufacturer', 'Metric', 'Sector'], aggfunc='sum')


df_new = temp_table.reset_index()


s1 = df_new.set_index(['Manufacturer','Sector']).query("Metric=='Orders'").Value
s2 = df_new.set_index(['Manufacturer','Sector']).query("Metric=='Sales'").Value

temp_frame = s1.div(s2).unstack()


fig = temp_frame.plot(kind='bar')
fig.update_layout(barmode='group')
fig.show()
equanimity
  • 2,371
  • 3
  • 29
  • 53