0

Example Dataset:

var1  var2  result
1     4     5
2     NaN   2
3     5     8
NaN   6     6
NaN   NaN   NaN

I would like to create a new variable by summing var1 & var2, as seen above. However - I don't want to just replace NaN in var2 with 0s.

This question was closed because someone pointed me towards a question that would fill var2 with zeros. I need the sum to work if either are NaN, and if var1 & var2 are both NaN, I need the resulting new variable to by NaN. I feel like this wasn't answered by the question duplicate I was directed to.

Thanks in Advance

Josh Gale
  • 39
  • 4

2 Answers2

3

A straightforward solution:

def mysum(var1, var2):
    if math.isnan(var1):
        return var2
    else:
        if math.isnan(var2):
            return var1
        else:
            return var1 + var2

If you want it to support a list of variables:

def mysum(vars):
    vars = [x for x in vars if not math.isnan(x)]  # delete all nans
    if vars:
        return sum(vars)
    else:
        return float('nan')

Note: numpy happens to have a function nansum which does exactly that, but its behavior was later changed for the case of all nans, so better define your own one.

anatolyg
  • 26,506
  • 9
  • 60
  • 134
2

You can specify min_count=1 in pandas sum:

df['result'] = df[['var1','var2']].sum(axis=1, min_count=1)

Result:

   var1  var2  result
0   1.0   4.0     5.0
1   2.0   NaN     2.0
2   3.0   5.0     8.0
3   NaN   6.0     6.0
4   NaN   NaN     NaN

From docs:

min_countint, default 0
The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

Cainã Max Couto-Silva
  • 4,839
  • 1
  • 11
  • 35