1

I would like to calculate the geometric mean of some data (including NaN), how can I do it?

I know how to calculate the mean value with NaNs, we can use the following code:

import numpy as np
M = np.nanmean(data, axis=2).

So how to do it with geomean?

MSeifert
  • 145,886
  • 38
  • 333
  • 352
ERIC
  • 185
  • 1
  • 2
  • 10
  • https://stackoverflow.com/questions/19852586/get-mean-value-avoiding-nan-using-numpy-in-python or https://stackoverflow.com/questions/5480694/numpy-calculate-averages-with-nans-removed for a more efficient and slightly longer version, but replace `mean` with `geomean` – Ry- Jun 04 '17 at 08:54

1 Answers1

2

You could use the identity (I only found it in the german Wikipedia but there are probably other sources as well):

enter image description here

This identity can be constructed using the "logarithm rules" on the normal definition of the geometric mean:

enter image description here

The base a can be chosen arbitarly, so you could use np.log (and np.exp as inverse operation):

import numpy as np

def nangmean(arr, axis=None):
    arr = np.asarray(arr)
    inverse_valids = 1. / np.sum(~np.isnan(arr), axis=axis)  # could be a problem for all-nan-axis
    rhs = inverse_valids * np.nansum(np.log(arr), axis=axis)
    return np.exp(rhs)

And it seems to work:

>>> l = [[1, 2, 3], [1, np.nan, 3], [np.nan, 2, np.nan]]

>>> nangmean(l)  
1.8171205928321397

>>> nangmean(l, axis=1)  
array([ 1.81712059,  1.73205081,  2.        ])

>>> nangmean(l, axis=0)  
array([ 1.,  2.,  3.])

In NumPy 1.10 also np.nanprod was added, so you could also use the normal definition:

import numpy as np

def nangmean(arr, axis=None):
    arr = np.asarray(arr)
    valids = np.sum(~np.isnan(arr), axis=axis)
    prod = np.nanprod(arr, axis=axis)
    return np.power(prod, 1. / valids)
MSeifert
  • 145,886
  • 38
  • 333
  • 352