0

I am running a function (rcompanion::cramerV) that uses boot.ci to generate CIs. I am getting the error message: Error in if (const(t, min(1e-08, mean(t, na.rm = TRUE)/1e+06))) { : missing value where TRUE/FALSE needed but I'm not sure how to fix it.

It's failing on a specific matrix:

> bad_data <- matrix(c(31, 0, 46, 0, 1, 0), nrow=3, ncol=2)
> bad_data
     [,1] [,2]
[1,]   31    0
[2,]    0    1
[3,]   46    0

When I call cramerV without CIs, it computes fine (yielding V = 1, which I recognise is statistically questionable), but when I call it with CIs e.g. cramerV(bad_data, ci=TRUE) it gives me that error. Happens regardless of type of CI specified.

Obviously there are some statistical issues here but that's not my focus; I'm running cramerV on hundreds of small matrices at once and a couple of outlier results is fine. I just need to know how to get it to work computationally (and maybe return NA for the CIs in this case).

Edit: I understand that what's happening is the if condition is evaluating to NA (per this answer), but I don't know why that's happening or what's going on under the hood with boot.ci or most importantly how to prevent it.

TY Lim
  • 509
  • 1
  • 3
  • 11
  • @jay.sf why would the second column be converted to entirely zeroes? – TY Lim Jun 13 '23 at 17:37
  • 1
    It's very likely that the second column gets entirely zeroes if the second row doesn't get resampled in a [bootstrap](https://en.m.wikipedia.org/wiki/Bootstrapping_(statistics)) replication w/replacement. What then internally happens is e.g. `chisq.test(c(31, 31, 46), c(0, 0, 0))`. In one of the packages you use, this case is not taken into account, resulting in a bad error message. – jay.sf Jun 13 '23 at 18:04

2 Answers2

1

Not a full solution, but recognising that sometimes the bootstrapping fails for reasons as outlined by @jay.sf, I created a workaround using tryCatch:

cramerV_errorhandling <- function(.x, ci=TRUE, ...) {
  if (ci==TRUE) {
    error_output <- data.frame(Cramer.V=NA, lower.ci=NA, upper.ci=NA)
  } else {
    error_output <- data.frame(Cramer.V=NA)
  }
  tryCatch(cramerV(.x, ci=ci, ...), error = function(e) error_output)
}

Note that the rcompanion::cramerV function returns either a single value if ci=FALSE or a dataframe of three values if ci=TRUE.

This version just returns a single or three-value dataframe of NA if cramerV runs into errors.

TY Lim
  • 509
  • 1
  • 3
  • 11
1

I'm the author of the function.

Amendment:

I updated the function to avoid this error. The following code will load the updated version from the internet. I'll update the function in the package at a later date.

In the updated version, if the bootstrapped values are all equal, it returns a data frame with the Cramer's V value, and NA for the confidence internal limits.

bad_data <- matrix(c(31, 0, 46, 0, 1, 0), nrow=3, ncol=2)

library(boot)

source("http://rcompanion.org/r_script/cramerV_2023_06_15.r")

cramerV(bad_data, ci=TRUE, histogram=TRUE)

cramerV(bad_data, ci=TRUE, reportIncomplete=TRUE, histogram=TRUE)

Compare:

bad_data_3 <- matrix(c(31, 0, 46, 1, 1, 0), nrow=3, ncol=2)

library(boot)

source("http://rcompanion.org/r_script/cramerV_2023_06_15.r")

cramerV(bad_data_3, ci=TRUE, histogram=TRUE)

cramerV(bad_data_3, ci=TRUE, reportIncomplete=TRUE, histogram=TRUE)

Original response:

I think the solution provided by @TYLim is a good one for this application.

The cramerV() function has a provision to prevent errors in the case of the resampling producing, in this case, all zeros in the second column.

For example, the following will return a small data frame, but with NA's for the confidence interval.

bad_data_1 <- matrix(c(31, 0, 46, 1, 1, 0), nrow=3, ncol=2)

library(rcompanion)

cramerV(bad_data_1, ci=TRUE)

Or it can be forced to produce the confidence interval for the resampled cases where there are not all zeros in the second column with the reportIncomplete option.

cramerV(bad_data_1, ci=TRUE, reportIncomplete=TRUE, histogram=TRUE)

In the case of the original bad_data matrix, the error appears to occur in the boot() call. As far as I can tell, it's because all the values for Cramer's V would be 1 for all the valid cases. I might be able to develop a solution to the problem in the function, but it doesn't occur often. The solution @TYLim suggests is probably the best approach, and would come into play only in specific and extreme cases.

Sal Mangiafico
  • 440
  • 3
  • 8
  • Thanks @SalMangiafico and thanks for creating the function in the first place! Could you comment a bit on using `reportIncomplete` - what are potential downsides there? – TY Lim Jun 15 '23 at 20:25
  • I added an amendment to my response with an updated function. – Sal Mangiafico Jun 16 '23 at 14:32
  • In the case of the `cramerV()` function, the function will report `NA` for the confidence interval limits if any of the bootstrap iterations have all zeros in any row or column. The `reportIncomplete` option simply ignores these cases, and returns the confidence interval for the cases that don't meet this criterion. This would bias the results in some way. Ex., for `bad_data`, it's only going to include iterations where that `1` is included in cell (2, 2). This only comes up in sparse tables or small samples sizes, where, honestly, the bootstrapped CI's probably aren't very valid anyway. – Sal Mangiafico Jun 16 '23 at 14:49