28

Say I have data.frame a

I use

m.fit <- lm(col2 ~ col3 * col4, na.action = na.exclude)

col2 has some NA values, col3 and col4 have values less than 1.

I keep getting

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
NA/NaN/Inf in foreign function call (arg 1)

I've checked the mailing list and it appears that it is because of the NAs in col2 but I tried using na.action=na.exclude/omit/pass but none of them seem to work. I've tested lm again on first 10 entries, definitely not because of the NAs. Problem with this warning is every google results seem to be pointing at NA.

Did I misinterpret the error or am I using lm wrongly?

Data is at kaggle. I'm modelling MonthlyIncome data using linear regression (as I couldn't get a certain glm family to work). I've created my own variables to use but if you try to model MonthlyIncome with variables already present it fails.

zx8754
  • 52,746
  • 12
  • 114
  • 209
Pk.yd
  • 311
  • 1
  • 3
  • 6
  • 1
    `m.fit<-lm(col2 ~ col3 + col4 + col3*col4, data=a, na.action=na.exclude)` is much more readable for specifying your model – mindless.panda Dec 07 '11 at 13:12
  • 4
    Without a reproducible example it is very hard to answer your question. Please see http://stackoverflow.com/q/5963269/567015 on instructions how to do this. – Sacha Epskamp Dec 07 '11 at 13:13
  • If you subset `a` for rows with no NA in `col2` and then run the `lm()`, do you still get the error? – mindless.panda Dec 07 '11 at 13:15
  • 1
    For what it's worth, `~ col3*col4` is equivalent to `~ col3+col4+col3:col4` which is in turn equivalent to `~ col3+col4+col3*col4` (the last is harmlessly redundant) – Ben Bolker Dec 07 '11 at 13:24
  • Thanks Ben you are right, I misread a dot in my notes. – Pk.yd Dec 07 '11 at 13:33
  • Not much point in posting a link to a datafile behind a login screen. Meanwhile, try plotting your data to see if it looks even vaguely linear. – Carl Witthoft Dec 07 '11 at 13:57
  • O dam I knew this would happen :S. But yea a smaller section of the data works just fine, the problem is when you use the all the observations. And linear regression is fairly logical given the data but definitely not something I want to use if I had better control over R. – Pk.yd Dec 07 '11 at 14:00
  • @Pk.yd : get a dropbox and use the public links in there (www.dropbox.com). That's still the cleanest solution to share data I know of. – Joris Meys Dec 07 '11 at 14:27
  • I signed up for Kaggle, and I can't replicate. `a <- read.csv("~/Downloads/cs-training.csv")`; `names(a)[2:4] <- paste("col",2:4,sep="")`; `m.fit <-lm(col2~col3*col4,data=a)` worked fine for me. – Ben Bolker Dec 07 '11 at 14:32
  • 1
    Which columns are you using in the `lm` fit? If you use the names in the header row in the file, it's clearer than `col2`, etc. I've tried a few column combinations and can't reproduce your error. – Richie Cotton Dec 07 '11 at 14:32
  • So does `a <- read.csv("~/Downloads/cs-training.csv")`; `m.fit <-lm(MonthlyIncome~age*DebtRatio*SeriousDlqin2yrs,data=a,na.action=na.exclude)` – Ben Bolker Dec 07 '11 at 14:35
  • Urgh sorry everyone It seems the problem was Inf values in my custom coloumn, which I fixed after a good night's sleep... Once again very very sorry for wasting time. – Pk.yd Dec 07 '11 at 20:57

11 Answers11

39

I know this thread is really old, but the answers don't seem complete, and I just ran into the same problem.

The problem I was having was because the NA columns also had NaN and Inf. Remove those and try it again. Specifically:

col2[which(is.nan(col2))] = NA
col2[which(col2==Inf)] = NA

Hope that helps your 18 month old question!

slammaster
  • 855
  • 6
  • 11
  • Thanks for this suggestion. Adding that in case you have some -Inf, make sure to make those NAs as well. That solved my problem. – bsg May 25 '14 at 06:06
  • 14
    For a one-liner: `col2[which(!is.finite(col2))] = NA` – Hugh May 26 '14 at 06:26
  • 1
    As I said the root source of the problem is: log(0) = -Inf. The Log of Zero in such case fails to plot. If you consider your approach, you are effectively replacing data values and omit those rows as I understand with NA. If so, I guess you end up with no error but likely not the same box plot? – algarecu Nov 30 '17 at 15:48
10

You should have a read the book A Beginner’s Guide to R for a complete explanation on this. Specifically, it mentions the following error:

Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok,...): NA/NaN/Inf in foreign function call (arg 4)

The solution is to add a small constant value to the Intensity data, for example, 1. Note that there is an on-going discussion in the statistical community concerning adding a small value. Be that as it may, you cannot use the log of zero when doing calculations in R.

algarecu
  • 416
  • 4
  • 14
8

I just suffered another possibility, after all posible na.omit and na.exclude checks.

I was taking something like:

lm(log(x) ~ log(y), data = ...)

Without noticing that, for some values in my dataset, x or y could be zero: log(0) = -Inf

So just another thing to watch out for!

arredond
  • 569
  • 6
  • 8
2

I solved this type of problem by resetting my options. options(na.action="na.exclude") or options(na.action="na.omit")

I checked my settings and had previously changed the option to "na.pass" which didn't drop my y observations with NAs (where y~x).

Jaap
  • 81,064
  • 34
  • 182
  • 193
Andrew
  • 21
  • 1
1

Try changing the type of col2 (and all other variables)

col2 <- as.integer(col2)
0

I just encountered the same problem. get the finite elements using

finiteElements = which(is.finite(col3*col4))
finiteData = data[finiteElements,]
lm(col2~col3*col4,na.action=na.exclude,data=finiteData)
d2a2d
  • 1,176
  • 10
  • 12
0

I encountered this error when my equivalent of col2 was an integer64 rather than an integer and when using natural and polynomial splines, splines::bs and splines:ns for example:

m.fit <- lm(col1 ~ ns(col2))
m.fit <- lm(col1 ~ bs(col2, degree = 3))

Converting to a standard integer worked for me:

m.fit <- lm(col1 ~ ns(as.integer(col2)))
m.fit <- lm(col1 ~ bs(as.integer(col2), degree = 3))
Scott Kaiser
  • 307
  • 4
  • 11
0

I got this error when I inverted the arguments when calling reformulate and use the formula in my lm call without checking, so I had the wrong predictor and response variable.

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
0

Another thing to watch out for is using functions like log() or sin() make your x's and y's inf. eg. log 0 = 0 or sin(pi) = 0.

0

This is what helped in my case. I parsed the data that already exclude NAs and INFs.

lm(y ~ x, data = data[(y != Inf & is.na(y) == FALSE)])
Eldorado
  • 69
  • 4
-1

Make sure you don't have any 0 in your dependent variable.