0

I have a dataframe (dayData), with two columns 'first' and 'average'. I am looking to divide 'first' by 'average' to create a new column 'second'.

Using the following:

dayData["second"] = dayData["first"] / dayData["average"]

However there is the possibility that 'average' can have a value of 0 in thecolumn (so when I divide the two columns I get a 'NaN' value). I would like to replace the 'NaN' value with zero. Is there a quick way to do this?

Thanks

Stacey
  • 4,825
  • 17
  • 58
  • 99
  • 4
    try `(dayData["first"] / dayData["average"]).fillna(0)` or `dayData['first'].div(dayData['average'], fill_value(0))` – EdChum Jun 25 '16 at 20:47
  • As EdChum's solution shows, I think your main question is essentially just a duplicate of this: http://stackoverflow.com/questions/13295735/how-can-i-replace-all-the-nan-values-with-zeros-in-a-column-of-a-pandas-datafra – Alex Riley Jun 25 '16 at 20:49
  • Shouldn't a division by zero yield `inf`? in which case you have to do `dayData['second'].replace(np.inf,0)` – EdChum Jun 25 '16 at 20:56
  • @EdChum no division by zero should truly be nan unless you know more about the denominator. The reason being, the limit as the denominator approaches zero from the positive side is infinity. However, from the negative side its negative infinity. So as we try to derive intuition about what the answer should be, we arrive at diametrically opposed options. If we new the denominator were always positive, like a standard deviation, the we could safely say infinity. – piRSquared Jun 25 '16 at 22:48
  • @piRSquared OK, I think my confusion came from some kind of float inprecision: `df = pd.DataFrame({'a':np.random.randn(5), 'b':np.arange(5)}) df['a']/df['b']` will return `inf` for he first row whilst `df = pd.DataFrame({'a':np.arange(5), 'b':np.arange(5)}) df['a']/df['b']` gives `NaN` at the first row – EdChum Jun 26 '16 at 06:32

2 Answers2

5

Your assumption is not entirely correct. You are getting a NaN for dividing zero by zero. If the numerator is a non-zero then you get an Inf. Example:

x = pd.DataFrame(data={'a': [0, 1], 'b':[0, 0]})
x['a'] / x['b']

gives us:

0    NaN
1    inf
dtype: float64

If you just want to remove NaNs then EdChum's answer is the one you need:

dayData["second"] = (dayData["first"] / dayData["average"]).fillna(0)

However if you're getting Inf's then you might need to replace it like so:

dayData["second"].replace([np.inf, -np.inf], 0)
yousraHazem
  • 393
  • 1
  • 3
  • 11
1

Sometime dataframe datatype is object. In that case also we get divide by zero exception. You can change it to desired datatype. For example if it was int first do:

dayData = dayData.astype(int)

Now divide and it will give NaN and not an exception.

user1953366
  • 1,341
  • 2
  • 17
  • 27