0

I have been trying to figure out why the standardization outputs using these methods do not seem to be equal, even though numerically they are the same?

library(vegan)

# subset data
env.data <- mite.env[1:10, c("SubsDens", "WatrCont")]

# method 1
env.data.x <- env.data
env.data.x$SubsDens <- as.vector(scale(env.data.x$SubsDens))
env.data.x$WatrCont <- as.vector(scale(env.data.x$WatrCont))

# method 2
env.data.y <- env.data
env.data.y <- as.data.frame(decostand(as.matrix(env.data.y), method = "standardize"))

# method 3
env.data.z <- env.data
normalize <- function(x){
  return((x - mean(x))/sd(x))
}
env.data.z$SubsDens <- normalize(env.data.z$SubsDens)
env.data.z$WatrCont <- normalize(env.data.z$WatrCont)

# comparison
env.data.x == env.data.y
env.data.x == env.data.z
env.data.y == env.data.z

Here is the output:

> env.data.x == env.data.y
   SubsDens WatrCont
1      TRUE     TRUE
2      TRUE     TRUE
3      TRUE     TRUE
4      TRUE     TRUE
5      TRUE     TRUE
6      TRUE     TRUE
7      TRUE     TRUE
8      TRUE     TRUE
9      TRUE     TRUE
10     TRUE     TRUE
> env.data.x == env.data.z
   SubsDens WatrCont
1     FALSE     TRUE
2     FALSE     TRUE
3     FALSE     TRUE
4     FALSE     TRUE
5     FALSE     TRUE
6     FALSE     TRUE
7     FALSE     TRUE
8     FALSE     TRUE
9     FALSE     TRUE
10    FALSE     TRUE
> env.data.y == env.data.z
   SubsDens WatrCont
1     FALSE     TRUE
2     FALSE     TRUE
3     FALSE     TRUE
4     FALSE     TRUE
5     FALSE     TRUE
6     FALSE     TRUE
7     FALSE     TRUE
8     FALSE     TRUE
9     FALSE     TRUE
10    FALSE     TRUE

Method 3, standardizing using the formula as a function, seems to be doing something different...

Thank you in advance for your answers!

  • 1
    Lacking any sample data, I'll guess it is related to https://stackoverflow.com/q/9508518 – r2evans Dec 22 '21 at 14:31
  • Could just be a floating point difference? Hard to say without seeing `env.data` or the outputs. Try `dput` for copying data and maybe look at values of `env.data.z-env.data.y`. If it's a really tiny difference like 1e-10, then not worth worrying about. When testing numerics I find its best to agree on a dp first that gives the accuracy you need, round to it then test. – Jonny Phelps Dec 22 '21 at 14:33
  • Instead of `x == y` (on individual columns), try `abs(x - y) < 1e-9` (or some meaningfully small number that is below the domain of your real numbers and above `.Machine$double.eps`). – r2evans Dec 22 '21 at 14:34

1 Answers1

1

Thank you Jonny Phelps and r2evans for your comments.

I should've just checked the difference between the columns.

env.data.x - env.data.z

Output was on the order of 1e-16, so not at all significant for my purposes.