I have a 100k row dataframe on which I want to compute a Cochran–Mantel–Haenszel test.
My variables are the educational level and a computed score factored in quantiles, and my grouping variable is the sex, and the code line looks like this :
mantelhaen.test(db$education, db$score.grouped, db$sex)
This code throws this error and warning :
Error in qr.default(a, tol = tol) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message: In ntot * rowsums : NAs produced by integer overflow
The error seems to be caused by my first variable, as on 7 variables tested I got the problem with only 2 of them, which seems to share no obvious common thing.
Missing values and factor levels don't seem to differ between variables which throws error and variable which doesn't. I tried with complete cases (with na.omit
) and the problem persists.
What does trigger this error ? does it mean ?
How can I get rid of it ?
Interesting posts : R: NA/NaN/Inf in foreign function call (arg 1), What is integer overflow in R and how can it happen?
ADDENDUM : Here is the result of str
(failures are education
and imc.cl
):
str(db[c("education","score.grouped","sex", ...)])
'data.frame': 104382 obs. of 7 variables:
$ age.cl: Ord.factor w/ 5 levels "<30 ans"<"30-40 ans"<..: 5 2 1 1 3 4 2 3 4 4 ...
..- attr(*, "label")= chr "age"
$ emploi2 : Factor w/ 8 levels "Agriculteurs exploitants",..: 3 5 6 8 8 8 8 3 3 3 ...
..- attr(*, "label")= chr "CSP"
$ tabac : Factor w/ 4 levels "ancien fumeur",..: 4 1 4 4 3 4 4 1 4 4 ...
..- attr(*, "label")= chr "tabac"
$ situ_mari2 : Factor w/ 3 levels "Vit seul","Divorsé, séparé ou veuf",..: 3 2 1 1 1 3 1 3 2 3 ...
..- attr(*, "label")= chr "marriage"
$ education : Factor w/ 3 levels "Universitaire",..: 1 1 1 1 3 1 1 1 1 1 ...
$ revenu.cl : Factor w/ 4 levels "<1800 euros/uc",..: 3 4 2 NA 4 1 1 4 4 1 ...
$ imc.cl : Ord.factor w/ 6 levels "Maigre"<"Normal"<..: 2 2 1 2 3 1 3 2 2 3 ...
..- attr(*, "label")= chr "IMC"
EDIT : by diving inside the function, the error and warning are caused by a call to qr.solve
. I don't understand anything about this but I'll try to dive deeper
EDIT2 : inside qr.solve
, the error is thrown by a Fortran
call to .F_dqrdc2
. This is so much beyond my level my nose is starting to bleed.
EDIT3 : I tried to head
my data to find out which line is in cause :
db2 = db %>% head(99787) #fails at 99788
db2 = db %>% tail(99698) #fails at 99699
mantelhaen.test(db2$education, db2$score.grouped, db2$sex)
This gives me not much information but maybe it could give you.