I'm using the MatchIt package on a system with 128GB of RAM.
Firstly, my data does not have any NAs. My first effort, using a generalised linear model (defaults to logistic regression) and a "nearest neighbour" worked:
headache6MontsMatch1 <- matchit(Headache_past_six_months ~ sex + age + townsend + alcohol + smoking, method="nearest", distance="glm", data=reducedDF)
But, from approx 100,000 records, I lost approx 30,000 from the matching. I would like to try an optimum "full" method.
headache6MontsMatch2 <- matchit(Headache_past_six_months ~ sex + age + townsend + alcohol + smoking, method="full", link="probit", distance="glm", data=reducedDF)
Unfortunately, this throws the error:
NAs produced by integer overflowError in if ((nc * nr > getMaxProblemSize()) && warning.requested) { :
missing value where TRUE/FALSE needed
Looking further into getMaxProblemSize()
, it appears as though I'm restricted to a hard limit for matching. So I've tried:
setMaxProblemSize()
Then double checking the problem size with getMaxProblemSize
yields Inf.
But I'm still running into the same problem. My machine sits comfortably at around 56GB of RAM out of 128GB, the CPU is only being drained at 6% and the disk is not really being touched.