1

EDIT: The problem was not within the geoMean function, but with a wrong use of aggregate(), as explained in the comments

I am trying to calculate the geometric mean of multiple measurements for several different species, which includes NAs. An example of my data looks like this:

species <- c("Ae", "Ae", "Ae", "Be", "Be")
phen <- c(2, NA, 3, 1, 2)
hveg <- c(NA, 15, 12, 60, 59)

df <- data.frame(species, phen, hveg)

When I try to calculate the geometric mean for the species Ae with the built-in function geoMean from the package EnvStats like this

library("EnvStats")
aggregate(df[, 3:3], list(df1$Sp), geoMean, na.rm=TRUE)

it works wonderful and skips the NAs to give me the geometric means per species.

Group.1     phen      hveg
1      Ae 4.238536 50.555696
2      Be 1.414214  1.414214

When I do this with my large dataset, however, the function stumbles over NAs and returns NA as result even though there are e.g 10 numerical values and only one NA. This happens for example with the column SLA_mm2/mg. My large data set looks like this:

> str(cut2trait1)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   22 obs. of  19 variables:
 $ Cut                : chr  "15_08" "15_08" "15_08" "15_08" ...
 $ Block              : num  1 1 1 1 1 1 1 1 1 1 ...
 $ ID                 : num  451 512 431 531 591 432 551 393 511 452 ...
 $ Plot               : chr  "1_1" "1_1" "1_1" "1_1" ...
 $ Grazing            : chr  "n" "n" "n" "n" ...
 $ Acro               : chr  "Leuc.vulg" "Dact.glom" "Cirs.arve" "Trif.prat" ...
 $ Sp                 : chr  "Lv" "Dg" "Ca" "Tp" ...
 $ Label_neu          : chr  "Lv021" "Dg022" "Ca021" "Tp021" ...
 $ PlantFunctionalType: chr  "forb" "grass" "forb" "forb" ...
 $ PlotClimate        : chr  "AC" "AC" "AC" "AC" ...
 $ Season             : chr  "Aug" "Aug" "Aug" "Aug" ...
 $ Year               : num  2015 2015 2015 2015 2015 ...
 $ Tiller             : num  6 3 3 5 6 8 5 2 1 7 ...
 $ Hveg               : num  25 38 70 36 68 65 23 58 71 27 ...
 $ Hrep               : num  39 54 77 38 76 70 65 88 98 38 ...
 $ Phen               : num  8 8 7 8 8 7 6.5 8 8 8 ...
 $ SPAD               : num  40.7 42.4 48.7 43 31.3 ...
 $ TDW_in_g           : num  4.62 4.85 11.86 5.82 8.99 ...
 $ SLA_mm2/mg         : num  19.6 19.8 20.3 21.2 21.7 ...

and the result of my code

gm_cut2trait1 <- aggregate(cut2trait1[, 13:19], list(cut2trait1$Sp), geoMean, na.rm=TRUE)

is (only the first two rows):

Group.1    Tiller     Hveg      Hrep     Phen     SPAD  TDW_in_g     SLA_mm2/mg
1       Ae 13.521721 73.43485 106.67933       NA 28.17698 1.2602475           NA
2       Be  8.944272 43.95452  72.31182 5.477226 20.08880 0.7266361   9.309672

Here, the geometric mean of SLA for Ae is NA, even though there are 9 numeric measurements and only one NA in the column used to calculate the geometric mean.

I tried to use the geometric mean function suggested here: Geometric Mean: is there a built-in? But instead of NAs, this returned the value 1.000 when used with my big dataset, which doesn't solve my problem.

So my question is: What is the difference between my example df and the big dataset that throws the geoMean function off the rails?

  • It is difficult to say what the difference is between the example dataset and your other dataset, for obvious reasons. Since you are computing the geometric mean per group, have you checked that there is at least one observation for each measurement in each group? Are all measurements positive? – dipetkov Mar 27 '19 at 10:16
  • Thanks you for your suggestion. Yes, I checked for that and all observations are positive, and there's at least one for each group. – Hedera.Helix Mar 27 '19 at 10:22
  • Then perhaps you can try to find a subset of your big dataset that still gives the error but it is small enough to post here? – dipetkov Mar 27 '19 at 10:26
  • I couldn't do that, which led me to the original mistake: `gm_cut2trait1 <- aggregate(cut2trait1[, 13:19], list(cut2trait1$Sp), geoMean, na.rm=TRUE)` uses the argument `na.rm=TRUE` for the aggregate function, not for the geoMean function. Because of that, geoMean defaulted to `na.rm=FALSE`. – Hedera.Helix Mar 27 '19 at 11:09
  • I found the answer to my problem (using an argument for a function within aggregate) [here](https://stackoverflow.com/questions/35001495/aggregate-function-in-r-with-multiple-function-arguments) – Hedera.Helix Mar 27 '19 at 11:15

0 Answers0