Geometric mean of rows in a data table in R

Question

I have a data table where I want to calculate the geometric mean for each row over several columns. Some of the values will have zeros so I need these to be excluded.

The geometric mean from Wiki is: "The geometric mean is defined as the nth root of the product of n numbers" so for 2 numbers its simply the square root of their product.

The nth root in my case will vary on each row depending on how many of the values are non-zeros in them.

In my example below the top 2 rows of the results column was worked out as follows:

1: (a * c)^(1/2)

2: (a * b * c)^(1/3)

so I need the formula to look at columns a:c, take the product of the non-zero values and then take the nth root of how many non-zero values there were.

library(data.table)
dt <- data.table(a = c(0.5, 0.3,0,0.6), b = c(0,0.4,0.1,0), 
c = c(0.9,0.5,0.1,0), Result = c(0.67, 0.39, 0.1, 0.6))

Apologies, Result is expected output and all values are non-negative. — MidnightDataGeek, Jan 19 '17 at 12:08

akrun · Accepted Answer · 2017-01-19T12:53:56.020

4

We can try using data.table methods

dt[, v1 := Reduce(`+`, lapply(.SD, function(x) x!=0)), .SDcols = 1:3]
dt[, result2 := round((Reduce(`*`, lapply(.SD, function(x) 
    replace(x, x==0, 1))))^(1/v1), 2), .SDcols = 1:3][, v1 := NULL][]
#    a   b   c Result result2
#1: 0.5 0.0 0.9   0.67    0.67
#2: 0.3 0.4 0.5   0.39    0.39
#3: 0.0 0.1 0.1   0.10    0.10
#4: 0.6 0.0 0.0   0.60    0.60

Or another less efficient option is to group by sequence of rows and then do it on each row

dt[, result2 := {
           u1 <- unlist(.SD)
           round(prod(u1[u1!=0])^(1/sum(u1!=0)), 2)} , 1:nrow(dt), .SDcols = 1:3]
dt
#     a   b   c Result result2
#1: 0.5 0.0 0.9   0.67    0.67
#2: 0.3 0.4 0.5   0.39    0.39
#3: 0.0 0.1 0.1   0.10    0.10
#4: 0.6 0.0 0.0   0.60    0.60

NOTE: Both of these are data.table methods.

Or another option contributed by @DavidArenburg

dt[, Result := round(Reduce(`*`, replace(.SD, .SD == 0, 1))^(1/rowSums(.SD != 0)), 2)]

Another vectorized option is to convert to matrix

library(matrixStats)
m1 <- as.matrix(setDF(dt)[1:3])
round(rowProds(replace(m1, !m1, 1))^(1/rowSums(m1!=0)), 2)
#[1] 0.67 0.39 0.10 0.60

edited Jan 19 '17 at 12:53

answered Jan 19 '17 at 12:07

akrun

874,273
37
540
662

the code block `dt[, result2....' returns an error...Error in eval(expr, envir, enclos) : object 'v1' not found – MidnightDataGeek Jan 19 '17 at 12:35
@MidnightDataGeek Which version of `data.table` you have? I am using `1.10.0` – akrun Jan 19 '17 at 12:35
I'm using 1.9.6. I've modified the answer below, is it still a 'data.table' method strictly speaking? Speed is key for me so I try and use DT for everything, – MidnightDataGeek Jan 19 '17 at 12:43
@MidnightDataGeek Okay, that is the reason, WIth 1.10.0, the other columns can also be accessed when using `.SDcols`. Yes, the first two are data.table methods. But, if you are using `apply`, it could convert to `matrix` and there is no gain in using `data.table` – akrun Jan 19 '17 at 12:44
2

Maybe a bit simpler ```dt[, Result := round(Reduce(`*`, replace(.SD, .SD == 0, 1))^(1/rowSums(.SD != 0)), 2)]``` – David Arenburg Jan 19 '17 at 12:51
@DavidArenburg That is nice, but are we allowed to `==` in `.SD` and `rowSums` with `data.table` – akrun Jan 19 '17 at 12:52
It's less recommended than `lapply` but your solution is pretty unreadable – David Arenburg Jan 19 '17 at 12:53
@DavidArenburg Thanks. Your solution is quick and easy to read. Amazing how you guys can come up with solutions so fast! – MidnightDataGeek Jan 19 '17 at 13:06

score 1 · Answer 2 · answered Jan 19 '17 at 12:11

1

This will also work, assuming all non-negative values.

dt$Result <- apply(dt, 1, function(x) (prod(x[x!=0]))^(1/sum(x!=0)))
dt
#     a   b   c    Result
#1: 0.5 0.0 0.9 0.6708204
#2: 0.3 0.4 0.5 0.3914868
#3: 0.0 0.1 0.1 0.1000000
#4: 0.6 0.0 0.0 0.6000000

answered Jan 19 '17 at 12:11

Sandipan Dey

21,482
2
51
63

1

Thanks. I have modified it for my needs... `dt[, Result := apply(.SD, 1, function(x) (prod(x[x != 0])) ^ (1/sum(x!=0))), .SDcols = 1:3]` – MidnightDataGeek Jan 19 '17 at 12:36

score 1 · Answer 3 · answered May 03 '17 at 14:59

1

prod(a)^(1/length(a)) gives geometric mean for vector a

answered May 03 '17 at 14:59

Ajay Ohri

3,382
3
30
60

As stated here, the equation for geomean above that is being up voted is wrong! It is to the power 1/n. No “sum” should be there. I’m surprised no one previously upvoted this comment! – RichardBJ Jul 10 '21 at 16:12

score -1 · Answer 4 · answered Jan 08 '19 at 23:05

-1

Other option:

m1 <- as.matrix(setDF(dt)[1:3])
exp(rowMeans(log(m1)))

answered Jan 08 '19 at 23:05

Tom Wenseleers

7,535
7
63
103

Geometric mean of rows in a data table in R

4 Answers4

Linked