1

To follow the exact methodology presented in an article I would like to calculate the Logarithmic mean of a data vector. I did not find any functions for this in R, or any previous discussions. The case for 2 numbers is clear, but I could not work out the most efficient method to calculate the log mean for a large vector of numbers. Any suggestions?

# Calculating different types of means

# some data
dat <- c(0.008845299, 0.040554701)

# arithmetic mean
arith.m <- mean(dat)

# logarithmic mean
# http://en.wikipedia.org/wiki/Logarithmic_mean
log.m <- (dat[1] - dat[2])/(log(dat[1])-log(dat[2]))

# geometric mean
# http://stackoverflow.com/questions/2602583/geometric-mean-is-there-a-built-in
geo_mean <- function(data) {
  log_data <- log(data)
  gm <- exp(mean(log_data[is.finite(log_data)]))
  return(gm)
}

geo.m <- geo_mean(dat)

# show arithmetic > logarithmic > geometric
arith.m; log.m; geo.m

# how to calculate logarithmic mean for a vector?
dat.n <- c(0.008845299, 0.040554701, 0.047645299, 0.036654701, 0.017345299, 0.018754701, 0.032954701, 0.043145299, 0.026845299, 0.033054701, 0.025554701)

UPDATE with calculation that strips out 0 values (BUT, as pointed out below is this valid?):

# add a very low number (generally considered zero in R)
nzero <- 1.940656e-324
dat.n <- c(dat.n, nzero)

# arithmetic mean
arith.m <- mean(dat.n)

geo_mean <- function(data) {
  log_data <- log(data)
  gm <- exp(mean(log_data[is.finite(log_data)]))
  return(gm)
}

geo.m <- geo_mean(dat.n)

lmv <- function(x){
  ddlog <- function(x){
    d <- rep(0, length(x))
    for (i in 1:length(x)){
      d[i] <- prod(x[i] - x[-i])
    }
    sum(log(x)[is.finite(log(x))]/d[is.finite(log(x))])
  }
  n <- length(x[which(x>0)]) - 1
  ((-1)^(n+1)*n*ddlog(x))^(-1/n)
}

log.m <- lmv(dat.n)

# show arithmetic > logarithmic > geometric
arith.m; log.m; geo.m
  • There's no function for it because it's simple: `diff(dat.n)/diff(log(dat.n))`. – Joshua Ulrich Mar 01 '13 at 15:15
  • @JoshuaUlrich: isn't that still for the case with just two numbers? – Aaron left Stack Overflow Mar 01 '13 at 15:23
  • 1
    @EdwardP.Morris: how is this defined when you have more than two numbers? The wikipedia page you link to only discusses that case. Are you thinking of the log-average, which is the same as the geometric mean? http://en.wikipedia.org/wiki/Geometric_mean#Log-average – Aaron left Stack Overflow Mar 01 '13 at 15:25
  • just leaving out zero or nearly-zero values is **extremely** problematic. For example, what is the geometric mean of `c(1e-1000,1,1)`? It should be `10^(-998/3)`. – Ben Bolker Mar 03 '13 at 16:37
  • @BenBolker Interesting! – Edward P. Morris Mar 03 '13 at 17:20
  • Less than about 4.940656e-324 appears to be essentially zero. The `psych` `geometric.mean` function also suffers from this problem. Maybe "High precision arithmetic" available in package Rmpfr (see also Rmpfr at R-Forge) might be needed. I guess in most situations it is not an issue, but it is good to get a feel for the limitations of the calculations. For my data, which includes near-zeros, I think I should use different summary statistics. – Edward P. Morris Mar 03 '13 at 17:40
  • (1) I agree that the geometric mean/log-average is not generally going to be useful when you have near-zero values (it will be near zero itself). (2) There are various packages for dealing with very small numbers (`Rmpfr` is one, `Brobdingnag` is another), if for some reason you really do need to do this. (3) `.Machine$double.xmin` will tell you the smallest number that is representable in floating point on your machine. – Ben Bolker Mar 03 '13 at 23:26

2 Answers2

4

Followed by wiki (generalized to (n+1) values):

http://en.wikipedia.org/wiki/Divided_difference#Expanded_form

http://en.wikipedia.org/wiki/Logarithmic_mean#Mean_value_theorem_of_differential_calculus_2

ddlog <- function(x){
  d <- rep(0, length(x))
  for (i in 1:length(x)){
    d[i] <- prod(x[i] - x[-i])
  }
  sum(log(x)/d)
}

# ddlog is to get divided difference of the logarithm.

lmv <- function(x){
  n <- length(x) - 1
  ((-1)^(n+1)*n*ddlog(x))^(-1/n)
}

R > a <- c(0.008845299, 0.040554701, 0.047645299, 0.036654701, 0.017345299, 0.018754701, 0.032954701, 0.043145299, 0.026845299, 0.033054701, 0.025554701)
R > 
R > lmv(a)
[1] 0.0277
liuminzhao
  • 2,385
  • 17
  • 28
  • Excellent, thanks that seems to do the job. My maths is a getting a bit rusty, I shall study up on the links you provided. – Edward P. Morris Mar 01 '13 at 16:13
  • Ok so the above works with the example given, however its returns NaN when there is a "zero" or number R considers to be a zero in the vector of numbers to be averaged. – Edward P. Morris Mar 03 '13 at 16:03
  • 1
    @EdwardP.Morris, I guess logarithmic mean only works for non-negative numbers. Copied from wiki for two numbers case: 'In mathematics, the logarithmic mean is a function of two non-negative numbers which is equal to their difference divided by the logarithm of their quotient.' – liuminzhao Mar 03 '13 at 16:09
  • Yes that seems reasonable. I think the issue is now that I have to remove all numbers in my vector that are small enough to be considered "zero", which according to http://rwiki.sciviews.org/doku.php?id=misc:r_accuracy is about 4.940656e-324. – Edward P. Morris Mar 03 '13 at 16:26
  • `is.finite` takes care of `Inf` problems in the log transformation, however the function returns NaN at `((-1)^(n+1)*n*ddlog(x))^(-1/n)`. Which is `-1.111856e+19 ^-0.09090909`. – Edward P. Morris Mar 03 '13 at 16:42
  • Sorry. silly mistake forgot to adjust `n` in the `lmv` function! – Edward P. Morris Mar 03 '13 at 16:46
  • Now updated liuminzhao's solution adjusted to strip out non-positive numbers above. – Edward P. Morris Mar 03 '13 at 17:16
2

Try this:

> -diff(dat.n)/-diff(log(dat.n))
 [1] 0.02082356 0.04400483 0.04191009 0.02580711 0.01804083 0.02519117 0.03782146 0.03435320 0.02984241
[10] 0.02914404
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138