1

In order to avoid dividing by zero, the following logic is aimed to calculate a percentage change and skips the values where the prior month has 0 in values. However, the following would yield all percentage change to null even though there are valid non-NaN non-zero numbers in the referred columns.

df_ex['metric_diff'] = (df_ex['metric_lastmonth'] - df_ex['metric_thismonth'])
try:
    df_ex['metric_pctdiff'] = (df_ex['metric_lastmonth'] - df_ex['metric_thismonth'])/df_ex['metric_thismonth']
except ZeroDivisionError:
    df_ex['metric_pctdiff'] = np.nan

print(len(df_ex[df_ex['metric_diff'].notna()]))

521

print(len(df_ex[df_ex['metric_pctdiff'].notna()]))

0

The outputs indicate that there are nominal difference with non-NaN values, yet the percentage difference comparison yields all NaN values when compared the same two columns. Is there a logic error I made? What went wrong here?

Additional details: here is a sample of the Dataframe. enter image description here

and in this example, there are two rows where we have non-null values from last month. enter image description here

and... the desired outputs should be like in the following: enter image description here

  • I interpret your results the following way: (1) `df_ex['metric_diff']` contains roughly 500 items that are **not** a `NaN`. (2) `df_ex['metric_pctdiff']` contains **only** `NaN`s. Is that correct? If so, then it's a bit strange, indeed. Could you add a small sample, a so-called MRE, to the question that reproduces those results (see [here](https://stackoverflow.com/help/minimal-reproducible-example) and [here](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples))? – Timus May 31 '23 at 11:06
  • `try`-`except` doesn't work row-wise, it's an either-or globally. Ie. if the `try`-part raises one `ZeroDivisionError`, then **all** the items in `df_ex['metric_pctdiff']` are set to `np.nan`. – Timus May 31 '23 at 11:08
  • Thank you Timus. I did not know that try - except is either-or globally. A MRE looks like the following: df_ex[0:10][['Identifier', 'metric_thismonth', 'metric_lastmonth']] – Pythonlearner Jun 01 '23 at 13:46
  • Identifier metric_thismonth metric_lastmonth ID_5AE9 NaN NaN ID_5AG4 NaN NaN ID_5AJ8 NaN NaN ID_JAA6 NaN NaN ID_XAD6 0 0 ID_VAC4 NaN NaN ID_VAE0 NaN NaN ID_VAG5 NaN NaN ID_VAJ9 NaN NaN ID_5AX9 4.21232 11.591 – Pythonlearner Jun 01 '23 at 13:47
  • Please edit samples into the question, don't provide them in comments. Imho your sample acutally doesn't reproduce your problem: I get `1` for `len(df_ex[df_ex['metric_pctdiff'].notna()])`, which is what I would expect? – Timus Jun 01 '23 at 14:06
  • Timus I ended up finding a solution from an earlier post https://stackoverflow.com/questions/69489664/zero-division-in-one-row-causing-error-in-all-other-rows-pandas/76383448#76383448. For some reason, using fillana(o) resolves the issue I was having – Pythonlearner Jun 01 '23 at 15:56

2 Answers2

0

Pandas will return inf when you divide by zero, and NaN when you divide zero by zero, that's why your ZeroDivisionError is not being caught and you got a return value NaN.

We could handle the case you want manually with the code below, it will calculate the percentage difference for non-zero 'metric_thismonth' and place NaN for zero 'metric_thismonth'

edit: lets try with a lambda function inside the apply() function

import pandas as pd
import numpy as np

df_ex['metric_diff'] = df_ex['metric_lastmonth'] - df_ex['metric_thismonth']

df_ex['metric_pctdiff'] = df_ex.apply(lambda row: (row['metric_diff'] / row['metric_lastmonth']) 
                                       if row['metric_lastmonth'] != 0 
                                       else np.nan, axis=1)
Saxtheowl
  • 4,136
  • 5
  • 23
  • 32
  • Thanks Saxtheowl. I tried the following with the requirement of calculation on non-null items but it still gave me an ZeroDivisionError: float division by zero error. df_ex['metric_pctdiff'] = np.where((df_ex['metric_lastmonth'].notnull() & df_ex['metric_diff'] != 0), df_ex['DLQ_60+_diff']/df_ex['metric_lastmonth'], np.nan) – Pythonlearner Jun 01 '23 at 13:56
0

Using .fillna(0) on the both columns resolves the problem (as described on the solution to another question):

df_ex['metric_pctdiff'] = df_ex['metric_diff'].fillna(0) / df_ex['metric_lastmonth'].fillna(0)
Kyle F Hartzenberg
  • 2,567
  • 3
  • 6
  • 24