31

I am trying to run a lme model with these data:

tot_nochc=runif(10,1,15)
cor_partner=factor(c(1,1,0,1,0,0,0,0,1,0))
age=runif(10,18,75)
agecu=age^3
day=factor(c(1,2,2,3,3,NA,NA,4,4,4))
dt=as.data.frame(cbind(tot_nochc,cor_partner,agecu,day))
attach(dt)

corpart.lme.1=lme(tot_nochc~cor_partner+agecu+cor_partner *agecu, 
                  random = ~cor_partner+agecu+cor_partner *agecu |day, 
                  na.exclude(day))

I get this error code:

Error in na.fail.default(list(cor_partner = c(1L, 1L, 2L, 1L, 1L, 1L, : missing values in object

I am aware there are similar questions in the forum. However, in my case:

  • cor_partner has no missing values;
  • the whole object is coded as a factor (at least from what the Global Environment shows).

I could exclude those NA values with an na.action, but I'd rather know why the function is reading missing values - to understand exactly what is happening to my data.

Ferdi
  • 540
  • 3
  • 12
  • 23
InverniE
  • 598
  • 1
  • 7
  • 21
  • this looks like a typo/thinko to me. Can you explain what `na.exclude(day)` is supposed to be doing? I would generally do this by adding `day` to the data frame, then **not** using `attach()`, but instead using the combined data frame-including `day`- in the `data` argument ... ?? – Ben Bolker Jul 07 '16 at 16:54
  • also, in the data set you give there are only 8 values of `day`, and 10 values of all of the other variables, so I get a "variable lengths differ" error ... – Ben Bolker Jul 07 '16 at 16:56
  • This was an example matrix, they are not the data I am using. day is part of the dt matrix and has 10 values, including NAs, I have edited. – InverniE Jul 07 '16 at 17:10

4 Answers4

38

tl;dr you have to use na.exclude() (or whatever) on the whole data frame at once, so that the remaining observations stay matched up across variables ...

set.seed(101)
tot_nochc=runif(10,1,15)
cor_partner=factor(c(1,1,0,1,0,0,0,0,1,0))
age=runif(10,18,75)
agecu=age^3
day=factor(c(1,2,2,3,3,NA,NA,4,4,4))
## use data.frame() -- *DON'T* cbind() first
dt=data.frame(tot_nochc,cor_partner,agecu,day)
## DON'T attach(dt) ...

Now try:

library(nlme)
corpart.lme.1=lme(tot_nochc~cor_partner+agecu+cor_partner *agecu, 
              random = ~cor_partner+agecu+cor_partner *agecu |day, 
              data=dt,
              na.action=na.exclude)

We get convergence errors and warnings, but I think that's now because we're using a tiny made-up data set without enough information in it and not because of any inherent problem with the code.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 2
    Thanks, it works without any warning on the actual data. I thought that na.exclude(day) would automatically exclude the whole row based on the value in "day", not work at single column value, so good to know! – InverniE Jul 08 '16 at 13:40
16

randomForest package has a na.roughfix function that "imputes Missing Values by median/mode"

You can use it as follows

fit_rf<-randomForest(store~.,
        data=store_train,
        importance=TRUE,
        prOximity=TRUE,
        na.action=na.roughfix)
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
kurapati
  • 171
  • 1
  • 2
4

If your data contain NA or missing values you can use this it will pass the data exactly the same as it is in datasets.

rf<-randomForest(target~.,data=train,
                  na.action = na.roughfix)
UseR10085
  • 7,120
  • 3
  • 24
  • 54
1

Another possible solution could be to use data <- na.omit(train) which will allow you to pass the data with ease.

patrickmdnet
  • 3,332
  • 1
  • 29
  • 34
Benjamin Diaz
  • 141
  • 1
  • 10
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-ask). – Community Sep 22 '21 at 06:13