4

so I've been trawling through existing questions for solutions to this one, but to no avail.

I have a dataset consisting of individuals (117), each with an observation from a different variable (12), and grouped by a factor variable with 8 levels.

I would like to do a canonical analysis of principal co-ordinates on these data based on the Anderson and Willis approach. I started by using BiodiversityR::CAPdiscrim. Let's start with some example data:

individual <- c(1:30)
group <- rep(c("a","b","c"), 10)
Var1 <- rnorm(n = 30, mean = 3.0e-4,sd = 2.0e-6)
Var2 <- rnorm(n = 30, mean = 2.4e-4,sd = 2.0e-6)
Var3 <- rnorm(n = 30, mean = 7.0e-6,sd = 9.0e-9)
Var4 <- rnorm(n = 30, mean = 4.2e-5,sd = 1.0e-6)
Var5 <- rnorm(n = 30, mean = 1.0e-4,sd = 9.0e-6)
Var6 <- rnorm(n = 30, mean = 8.0e-5,sd = 1.0e-5)

df <- data.frame(cbind(individual, group, Var1, Var2, Var3, Var4, Var5, Var6))
df$Var1 <- as.numeric(levels(df$Var1))[as.integer(df$Var1)]
df$Var2 <- as.numeric(levels(df$Var2))[as.integer(df$Var2)]
df$Var3 <- as.numeric(levels(df$Var3))[as.integer(df$Var3)]
df$Var4 <- as.numeric(levels(df$Var4))[as.integer(df$Var4)]
df$Var5 <- as.numeric(levels(df$Var5))[as.integer(df$Var5)]
df$Var6 <- as.numeric(levels(df$Var6))[as.integer(df$Var6)]

CAPdiscrim requires data in a particular format:

vars <- df[3:8]

now we can run CAPdiscrim on the data

BiodiversityR::CAPdiscrim(vars~group,
                          data = df,
                          dist = "euclidean",
                          axes = 4,
                          m = 0,
                          permutations = 999)

Which returns:

Error in lda.default(x, grouping, ...) : variable 1 appears to be constant within groups

We can use nearZeroVar to see if this is true (which is appears not to be true):

vars_check <- nearZeroVar(vars, saveMetrics = TRUE, names = TRUE)
vars_check

    freqRatio percentUnique zeroVar   nzv
Var1         1           100   FALSE FALSE
Var2         1           100   FALSE FALSE
Var3         1           100   FALSE FALSE
Var4         1           100   FALSE FALSE
Var5         1           100   FALSE FALSE
Var6         1           100   FALSE FALSE

Now I saw other questions regarding this error specific to lda() and I noticed that CAPdiscrim() calls vegdist(), cmdscale() and lda() so I tried to break down this analysis peice by peice:

dist_matrix <- vegdist(vars,
                       method = "euclidean",
                       binary = FALSE,
                       diag = FALSE,
                       upper = FALSE,
                       na.rm = TRUE)

PCA_vars <- cmdscale(d = dist_matrix,
                       k = 5,
                       eig = TRUE,
                       add = FALSE,
                       x.ret = FALSE)

LDA_pldist <- lda(x = PCA_vars$points,
                  grouping = df$group)

Which returns a very similar result:

Error in lda.default(x, grouping, ...) : variables 1 2 3 4 5 appear to be constant within groups

Now in lda() there is an argument "tol" which can be used to remove this error when dealing with very small numbers, so I can do this:

LDA_pldist <- lda(x = PCA_vars$points,
                  grouping = df$group,
                  tol = 1.0e-25)

This provides some output, but doesn't include some of the features of CAPdiscrim such as allowing the function to determine the best number for "m" through permutations.

Can anyone suggest how to modify the tolerance in CAPdiscrim()? or how to carry out what CAPdiscrim() is doing under the hood manually with these other functions?

Any insight would be greatly appreciated.

J.Con
  • 4,101
  • 4
  • 36
  • 64

2 Answers2

0

I was experiencing the exact same problem. After updating the package to BiodiversityR_2.8-3, the error went away.

(Using the data you provided)

BiodiversityR::CAPdiscrim(vars~group,
                          data = df,
                          dist = "euclidean",
                          axes = 4,
                          m = 0,
                          permutations = 999)
#Percentage of correct classifications was 26.66667 
#Significance of this percentage was 0.98999 

#Overall classification success (m=1) : 26.6666666666667 percent
#a (n=10) correct: 10 percent
#b (n=10) correct: 70 percent
#c (n=10) correct: 0 percent
#Warning message:
#In cmdscale(distmatrix, k = nrow(x) - 1, eig = T, add = add) :
# only 18 of the first 29 eigenvalues are > 0
J.Con
  • 4,101
  • 4
  • 36
  • 64
  • Thanks for the update @J.Con. I contacted the author and they provided me with a temporary fix whilst they were in the process of updating the package. I'm happy to hear that the new update has fixed the problem as well. – Aaarrrgh's My Game Jul 01 '17 at 16:42
0

The Author of BiodiversityR::CAPdiscrim has fixed the problem and this has been rolled out in subsequent package updates. It was a case of some error-checks relying on absolute values making sense from an ecology perspective vs relative values compared to the input data.