0

I am currently performing a row-wise calculation on a pd.DataFrame using df.apply(foo), where foo is effectively as follows:

def foo(row):
    n = row['A']
    d = row['B']

    if n <= 0:
        return 0
    if d <= 0:
        return 100
    return n / d * 100

This seems to be begging to be simplified into an np.where.

I have other cases with only one if statement (i.e. if n <= 0), which I have already simplified into

np.where(df['A'] <= 0, 0, df['A'] / df['B'])

However, I can't see how to do the same with the double-if case. At least not elegantly. I could do

np.where(df['A'] <= 0, 0, np.where(df['B'] <= 0, 100, n / d * 100))

But this would seem to run through the entire dataframe twice, once for each np.where call.

Is there a better way of doing things? Or, alternatively, is the use of np.where and the vectorization it brings so great that running through the table twice with np.where is still better than only once with pd.apply?

Wasabi
  • 2,879
  • 3
  • 26
  • 48

0 Answers0