Pandas dataframe conditional formatting still leads to "division by zero" error even though it... shouldn't?

Question

I'm trying to add a new column that is the result of division between 2 others, but eliminating division by zero rows by only applying the division to rows where the denominator is greater than zero.

df['division'] = 0

df.loc[(df['B'] > 0), 'division'] = (df['A'] / df['B'])

It works just fine when you set df = df.head(X), where X eliminates the rows that don't contain zero's in 'B', so I know it's a failure of the conditional formatting but I don't understand why.

Is there a reason this conditional formatting doesn't work that isn't obvious? This formatting does work fine:

df.loc[df['B'] > 0]

And returns the df that you would expect.

The reason is that (df['A'] / df['B']) gets evalutated on the whole dataframe regardless of the condition, and **only then** do rows that fulfill the condition have their values replaced depending on that result. See: df['A']/df['B'] makes no mention of the condition. — Uretki, Jul 12 '22 at 14:07

constantstranger · Answer 1 · 2022-07-12T15:06:53.080

Note that pandas supports division by zero for columns with numeric dtype (such as float and int64) by returning a result of inf. However, for columns of object type, it raises a ZeroDivisionError exception.

Example:

import pandas as pd
df = pd.DataFrame({'A':[1,2,3,4,5], 'B':[0,1,2,3,4]})
print(df)

print('', 'result for:', f'{df.dtypes}:', sep='\n')
print(df['A'] / df['B'])
df = df.astype('float')
print('', 'result for:', f'{df.dtypes}:', sep='\n')
print(df['A'] / df['B'])
df = df.astype('object')
try:
    print('', 'result for:', f'{df.dtypes}:', sep='\n')
    print(df['A'] / df['B'])
except (ZeroDivisionError):
    print('raised ZeroDivisionError exception')

Output:

   A  B
0  1  0
1  2  1
2  3  2
3  4  3
4  5  4

result for:
A    int64
B    int64
dtype: object:
0         inf
1    2.000000
2    1.500000
3    1.333333
4    1.250000
dtype: float64

result for:
A    float64
B    float64
dtype: object:
0         inf
1    2.000000
2    1.500000
3    1.333333
4    1.250000
dtype: float64

result for:
A    object
B    object
dtype: object:
raised ZeroDivisionError exception

One possible solution is to set the dtype of the columns you plan to divide to a numeric type such as float:

try:
    print('', 'result for:', f'{df.dtypes}:', sep='\n')
    print('first change column types to float')
    df.A = df.A.astype('float')
    df.B = df.B.astype('float')
    print(df['A'] / df['B'])
except (ZeroDivisionError):
    print('raised ZeroDivisionError exception')

Output:

result for:
A    object
B    object
dtype: object:
first change column types to float
0         inf
1    2.000000
2    1.500000
3    1.333333
4    1.250000
dtype: float64

score 0 · Answer 2 · answered Jul 12 '22 at 14:07

0

I would keep all the rows and place an np.nan where the denominator is 0:

df['division'] = np.where(df['B']!=0, df['A'] / df['B'], np.nan)

answered Jul 12 '22 at 14:07

Alessandro

361
1
9

Pandas dataframe conditional formatting still leads to "division by zero" error even though it... shouldn't?

2 Answers2

Linked