How can I print a statement in Python to show the results of a mathematical operation on a Pandas Dataframe?

Question

So I've got a simple summary dataframe of sales by gender that goes:

Gender |    Sales
___________________
M      |    25
F      |    30

All I want to do now is to return a line in Python that reads: The mean gap in the amount sold is 16.67%

This is simply 30 - 25 divided by 30, and multiplied by 100; and I want a % sign at the end.

I have tried:

m_sales = df.loc[df['Gender'] == 'M']
f_sales = df.loc[df['Gender'] == 'F']

print('The mean gap in the amount sold is:', m_sales['Sales'] - f_sales['Sales'] / m_sales['Sales'] * 100, '%')

Unfortunately this does not work. I get:

The mean gap in the amount sold is: 0 NaN 1 NaN Name: Sales, dtype: object %

Thoughts please? I am very much a beginner so sorry for such a basic query!

Seems like an straight-forward `loc` usage. Have you tried it? — rafaelc, Oct 09 '20 at 16:20
This is very easy. What things have your tried, and where did you get stuck? You have to try your own code and post it. It's not allowed on SO to just post a spec and ask other people to write code for you. — smci, Oct 09 '20 at 16:21
Does this answer your question? [pandas get column average/mean](https://stackoverflow.com/questions/31037298/pandas-get-column-average-mean) — ayrusme, Oct 09 '20 at 16:22
@smci - apologies didn't mean to just leave it as-is, was in a hurry and forgot to write my own code. Have updated my query. — SQLGIT_GeekInTraining, Oct 09 '20 at 16:36
@thatrockbottomprogrammer - No - I have already used `groupby()` and `mean()` on my previous dataset to calculate the mean sales, where I'm getting stuck is just printing out my result as I've explained above. — SQLGIT_GeekInTraining, Oct 09 '20 at 16:39
Since you essentially have a single column (Sales) indexed by Gender, rather than a full dataframe, it's cleanest to read in with `read_csv(..., index_col=[0], squeeze=True)`. Then you can directly use `.loc['M']` like you want to. — smci, Oct 09 '20 at 17:22
You can simplify your formula to `(1 - sl.loc['M']/sl.loc['F'])`. And in fact if it's given that there are exactly two Genders and you're only interested in the relative diff, without knowing which gender sold more, you could use `1 - float(sl.diff().tail(1) / sl.max())`. — smci, Oct 09 '20 at 17:25
If you want to take things one step further and reduce to `sl.to_dict()`, now you can directly reference sl['M']/sl['F']. This is the clearest and most compact, but it loses the pandas type and methods, also hardcodes the genders. — smci, Oct 09 '20 at 18:13
Also, [f-strings](https://docs.python.org/3/reference/lexical_analysis.html#f-strings) even have a datatype for percent: `:.2%`, so we don't need the `100. * (...)` boilerplate either. See my answer. — smci, Oct 09 '20 at 18:22

MarianD · Accepted Answer · 2020-10-09T17:12:51.317

0

Append ["Sales"].iloc[0] to your filter expressions to obtain directly values for M and F, and then project these changes to the print() function, too:

m_sales = df.loc[df['Gender'] == 'M']["Sales"].iloc[0]
f_sales = df.loc[df['Gender'] == 'F']["Sales"].iloc[0]

print('The mean gap in the amount sold is:', (f_sales - m_sales) / f_sales * 100, '%')

The mean gap in the amount sold is: 16.666666666666664 %

The explanation:

df.loc[df['Gender'] == 'M'] is a dataframe;
selecting the "Sales" column by appending ["Sales"] you obtain a series (with only 1 element), and
by appending .iloc[0] you obtain the first (=the only one) element of that series.

Note:

You may use f-string (for Python 3.6+) or .format() method for adjusting your output, e.g.

print(f'The mean gap in the amount sold is: {(f_sales - m_sales) / f_sales * 100:.2f}%')

The mean gap in the amount sold is: 16.67%

edited Oct 09 '20 at 17:12

answered Oct 09 '20 at 16:59

MarianD

13,096
12
42
54

Makes perfect sense - thank you, I can see what I was missing. However, I get the following error: --------------------------------------------------------------------------- TypeError Traceback (most recent call last) in 2 f_sales = df.loc[df['Gender'] == 'F']["Sales"].iloc[0] 3 ----> 4 print('The mean gap in the amount sold is:', (m_sales - f_sales) / m_sales * 100, '%') TypeError: unsupported operand type(s) for -: 'str' and 'str' – SQLGIT_GeekInTraining Oct 09 '20 at 17:13
I tested it and it worked OK. You probably have string values in your column "Sales" (try `df.dtypes` to see data types of your columns). Use `int()` or `float()` functions to convert the selected values for `m_sales` and `f_sales` into numbers: `m_sales = int(m_sales)`; `f_sales = int(f_sales)` before the `print()` function. – MarianD Oct 09 '20 at 17:26
Yes - I think when I was testing it in the sandbox I had generated a fresh dataframe but forgot to check for data types. Works like a charm - special thanks to you for such a clean and crisp explanation -really helps a beginner like me! – SQLGIT_GeekInTraining Oct 09 '20 at 17:27

smci · Answer 2 · 2020-10-09T18:21:57.517

Ok, you want to be able to index your sales directly by gender (using .loc[]), so for that we read your dataframe with index_col=[0] to set index to Gender column, then squeeze=True to reduce the remaining 1-column dataframe into a Series.

Then I use an f-string for formatting. Note we can inline your expression inside the f-string:

import pandas as pd
from io import StringIO    

dat = """\
Gender |    Sales
___________________
M      |    25
F      |    30
"""

sl = pd.read_csv(StringIO(dat), sep='\s*\|\s*', skiprows=[1], index_col=[0],
    engine='python', squeeze=True)

#               Sales
# Gender            
# M               25
# F               30

print(f"The mean gap in the amount sold is: {100.*(1 - sl.loc['M']/sl.loc['F']):.2f}%")
# The mean gap in the amount sold is: 16.67%

# ...but f-strings even have a datatype for percent: `:.2%`, so we don't need the `100. * (...)` boilerplate.
print(f"The mean gap in the amount sold is: {(1 - sl.loc['M']/sl.loc['F']):.2%}")
The mean gap in the amount sold is: 16.67%

...and if you want to take things one step further and reduce the df -> Series -> dict, do sl.to_dict(), now you can directly reference sl['M']/sl['F'] like you might want to (obviously we lose all the rich methods of a Series.)

This is very interesting - hadn't considered something like this.. will revisit this as it's going to take me a little while to digest :) — SQLGIT_GeekInTraining, Oct 09 '20 at 17:28
Sure. In general try to aggressively reduce the dimensionality and eliminate any unnecessary columns. NA removal, filtering, aggregating and setting the index usually do the trick. — smci, Oct 09 '20 at 17:30
If you want to take things one step further and reduce the df -> Series -> dict, do `sl.to_dict()`, now you can directly reference `sl['M']/sl['F']` like you might want to. (obviously we lose all the rich methods of a Series.) — smci, Oct 09 '20 at 18:09

How can I print a statement in Python to show the results of a mathematical operation on a Pandas Dataframe?

2 Answers2