0

I'm using the MatchIt package on a system with 128GB of RAM.

Firstly, my data does not have any NAs. My first effort, using a generalised linear model (defaults to logistic regression) and a "nearest neighbour" worked:

headache6MontsMatch1 <- matchit(Headache_past_six_months ~ sex + age + townsend + alcohol + smoking, method="nearest", distance="glm", data=reducedDF)

But, from approx 100,000 records, I lost approx 30,000 from the matching. I would like to try an optimum "full" method.

headache6MontsMatch2 <- matchit(Headache_past_six_months ~ sex + age + townsend + alcohol + smoking, method="full", link="probit", distance="glm", data=reducedDF) 

Unfortunately, this throws the error:

NAs produced by integer overflowError in if ((nc * nr > getMaxProblemSize()) && warning.requested) { : 
  missing value where TRUE/FALSE needed

Looking further into getMaxProblemSize(), it appears as though I'm restricted to a hard limit for matching. So I've tried:

setMaxProblemSize()

Then double checking the problem size with getMaxProblemSize yields Inf.

But I'm still running into the same problem. My machine sits comfortably at around 56GB of RAM out of 128GB, the CPU is only being drained at 6% and the disk is not really being touched.

halfer
  • 19,824
  • 17
  • 99
  • 186
Anthony Nash
  • 834
  • 1
  • 9
  • 26
  • What is your `nc * nr` relative to `optmatch:::getMaxProblemSize()`? – Chris Aug 16 '21 at 17:58
  • @Chris - I'd love to know myself. Any idea how I find either? – Anthony Nash Aug 17 '21 at 09:09
  • 1
    While scratching around for the exact error, [lines 48-55](https://github.com/markmfredrickson/optmatch/blob/master/R/feasible.R) suggest a method, though still haven't found the exact error file. – Chris Aug 17 '21 at 15:43
  • 1
    Where exactly the error is being propagated may be related to the formula and not from either `matchIt` or `optmatch`, as [here](https://stackoverflow.com/questions/44398763/how-to-resolve-integer-overflow-errors-in-r-estimation), in your case perhaps `glm`. The integer overflow is saying that the class of integer (Uint8/16/32/64) is wrapping because it's run out of integers, so counting over. And setting max problem size to `inf` doesn't address this. – Chris Aug 17 '21 at 15:56

1 Answers1

1

This is a kind of funny error and has nothing to do with MatchIt. It has to do with the fact that R cannot represent large numbers as integers.

I assume you have approximately 35000 treated and 65000 control units. optmatch computes the problem size as nc * nr, where nc is the number of control and nr is the number of treated. optmatch stores these numbers as integers because they are the dimensions of a distance matrix used internally. With nr = 35000 and nc = 65000, nc * nr is a very large number. R cannot represent numbers that large as integers (see here) and produces NA for this value instead. Because NA cannot be used in an if statement, the error is thrown.

There is no solution to this problem except to use a smaller sample or ask the optmatch developers to fix this bug. They could easily fix this by converting nc and nr to double values before computing nc * nr.


Edit 8/21/21: I contacted the optmatch maintainers and they fixed this issue. It will be corrected in the upcoming version of optmatch.

Noah
  • 3,437
  • 1
  • 11
  • 27
  • Thank you Noah. Sorry, it's taken so long to reply, work has been extremely busy. Your reply has been extremely helpful. – Anthony Nash Sep 07 '21 at 18:19