How to compute a ratio between two filtered variables in a 'tidy' Pandas data frame

Question

I have the following Pandas data frame in 'tidy' (long) format:

df = pd.DataFrame({'Manufacturer':['Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW'],
                          'Metric':['Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty'],
                          'Sector':['Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA'],
                          'Value':[45000, 70000, 65000, 40000, 65000, 63000, 2700, 4400, 3400, 3000, 4700, 5700, 1500, 2000, 2500, 1300, 2000, 2450],
                       })

I'm trying to compute the ratio between Orders to Sales, broken down by Manufacturer and Sector.

I've tried a few approaches (e.g. groupby, transform, etc.) based on the answers provided in Pandas percentage of total with groupby, but was unsuccessful with any of these (likely due to the data being structured a bit differently, and the lack of filtering required).

The ultimate goal is to make a bar plot displaying the Orders to Sales ratio broken down by Sector and Manufacturer, similar to the following:

Question: how would I compute the ratio between Orders to Sales (broken down by Manufacturer and Sector) as a means to plot the result?

Thanks in advance!

/////////////////////////////////////////////////// UPDATE/////////////////////////////////////////////

If we amend the data frame to include another column which results in duplicate data in one of the indices, we get an error.

Here is the updated data frame with a new City column:

test_df = pd.DataFrame({'Manufacturer':['Ford', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW'],
                          'Metric':['Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty'],
                          'Sector':['Germany', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA'],
                          'Value':[45000, 70000, 90000, 65000, 40000, 65000, 63000, 2700, 4400, 3400, 3000, 4700, 5700, 1500, 2000, 2500, 1300, 2000, 2450],
                          'City': ['Frankfurt', 'Bremen', 'Berlin', 'Hamburg', 'New York', 'Chicago', 'Los Angeles', 'Dresden', 'Munich', 'Cologne', 'Miami', 'Atlanta', 'Phoenix', 'Nuremberg', 'Dusseldorf', 'Leipzig', 'Houston', 'San Diego', 'San Francisco']
                       })

Running the following:

s1=df.set_index(['Manufacturer','Sector']).query("Metric=='Orders'").Value
s2=df.set_index(['Manufacturer','Sector']).query("Metric=='Sales'").Value

s1.div(s2).unstack().plot(kind='bar')

Results in the following error:

`ValueError: Index contains duplicate entries, cannot reshape`

The reason is because of these duplicates:

Manufacturer   Metric   Sector    Value    City
Ford           Orders   Germany   45000    Frankfurt
Ford           Orders   Germany   70000    Bremen

I tried to treat this use case with the following from Pandas unstack problems: ValueError: Index contains duplicate entries, cannot reshape:

df.reset_index().pivot_table(values = 'Value', index = ['Manufacturer', 'Metric'], aggfunc='sum')

But, I was unsuccessful and still get the error message about duplicate values.

Does anyone have any ideas about how to solve this?

Thanks in advance!

score 0 · Answer 1 · answered Jun 12 '20 at 14:41

I created the code as a graph comparing orders and sales by manufacturer and country of production.

sales = df[df['Metric'] == 'Sales'].set_index(['Manufacturer','Sector']).groupby(level=[0,1])['Value'].sum().to_frame('Sales')

orders = df[df['Metric'] == 'Orders'].set_index(['Manufacturer','Sector']).groupby(level=[0,1])['Value'].sum().to_frame('Orders')
df2 = orders.merge(sales, on=['Manufacturer','Sector'])

df2
              Orders    Sales
Manufacturer Sector     
BMW      Germany    65000   3400
         USA        63000   5700
Ford     Germany    115000  2700
         USA        40000   3000
Mercedes Germany    90000   4400
         USA        65000   4700

df2.plot(kind='bar')

score 0 · Answer 2 · answered Jun 12 '20 at 19:27

After some additional testing, I have found a solution:

test_df = pd.DataFrame({'Manufacturer':['Ford', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW', 'Ford', 'Mercedes', 'BMW'],
                          'Metric':['Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Orders', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Sales', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty', 'Warranty'],
                          'Sector':['Germany', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA', 'Germany', 'Germany', 'Germany', 'USA', 'USA', 'USA'],
                          'Value':[45000, 70000, 90000, 65000, 40000, 65000, 63000, 2700, 4400, 3400, 3000, 4700, 5700, 1500, 2000, 2500, 1300, 2000, 2450],
                          'City': ['Frankfurt', 'Bremen', 'Berlin', 'Hamburg', 'New York', 'Chicago', 'Los Angeles', 'Dresden', 'Munich', 'Cologne', 'Miami', 'Atlanta', 'Phoenix', 'Nuremberg', 'Dusseldorf', 'Leipzig', 'Houston', 'San Diego', 'San Francisco']
                       })

temp_table = test_df.reset_index().pivot_table(values = 'Value', index = ['Manufacturer', 'Metric', 'Sector'], aggfunc='sum')


df_new = temp_table.reset_index()

s1=df_new.set_index(['Manufacturer','Sector']).query("Metric=='Orders'").Value
s2=df_new.set_index(['Manufacturer','Sector']).query("Metric=='Sales'").Value

s1.div(s2).unstack().plot(kind='bar')

Thanks to @YOBEN_S for the solution to the original data set and for inspiring this solution!

How to compute a ratio between two filtered variables in a 'tidy' Pandas data frame

2 Answers2