0

I have a dataset where I have filtered out the NA values and plan to fit a general linear model to predict stuff. However when I do this:

     model<-bayesglm(total_score ~ ., data=traint)

I get the following error:

     Error in lm.fit(x = x.star[good.star, , drop = FALSE] * w.star, y = z.star *  : 
         NA/NaN/Inf in 'y'

Based on a previous post: lm() NA/NaN/Inf error , I am trying to eliminate the finite values of the dataset....

Having checked for such finite values with:

     summary(timesData)

     output too long to show

I can't find any symbol that tells me how to subset the data in such a way that I can filter out these finite values....

My attempt at doing this anyway is as follows:

    train<-subset(timesData, !is.finite(timesData))

Naturally, as I haven't specified a column I get:

    Error in is.finite(timesData) : 
     default method not implemented for type 'list'

I tried lapply:

    lapply(timesData, byrow=F, is.finite(timesData))

but

    Error in FUN(X[[i]], ...) : 
       2 arguments passed to 'is.finite' which requires 1

So overall my question is how do I find the values in the dataset which are finite when 'summary()' doesn't reveal the columns where they reside and also how can I use lapply to then get rid of them?

My data is publicly available at kaggle: https://www.kaggle.com/mylesoneill/world-university-rankings

Community
  • 1
  • 1
johnny utah
  • 269
  • 3
  • 17

3 Answers3

3
     df = data.frame(
      a = c(2,4/0,5), 
      b = c(1/0,3,5), 
      c = c(4,3,5))
     df
        a   b c
    1   2 Inf 4
    2 Inf   3 3
    3   5   5 5

#which columns have infinite values  
 is.infinite(colSums(df))
    a     b     c 
 TRUE  TRUE FALSE 

#only rows with finite values
   df[is.finite(rowSums(df)), ]
      a b c
    3 5 5 5

#or with apply and all     
   df[apply(apply(df, 2,is.finite),1,all),]
      a b c
    3 5 5 5
Cabana
  • 419
  • 2
  • 7
2

Here is something you can try, it's strange that is.finite and is.infinite don't support data frame though, since similar function like is.na does:

timesData[apply(timesData, 1, function(row) all(is.finite(row))),]

An alternative method would be to convert the timesData to matrix and then use is.finite and is.infinite both of which have been implemented for class "matrix":

timesData[rowSums(is.infinite(as.matrix(timesData))) == 0, ]

This should usually be faster than the apply method.

Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Just replace `"is.na"` with `"is.finite"` in `is.na.data.frame` and vuala - you have a method for data frames. :) Probably why there isn't a method. Too easy. – Rich Scriven Jun 14 '16 at 20:00
2

One way to solve it is to loop over the columns, and replace infinite values by some sensible value, such as zero or -1 (depending on your data) For example:

for (f in names(train)) {
  timesData[,f]<-ifelse(is.finite(timesData[,f]), timesData[,f], -1)
}
Matias Thayer
  • 571
  • 3
  • 8