6

I try a regression with R. I have the following code with no problem in importing the CSV file

    dat <- read.csv('http://pastebin.com/raw.php?i=EWsLjKNN',sep=";")
dat # OK Works fine
Regdata <- lm(Y~.,na.action=na.omit, data=dat)
summary(Regdata)

However when I try a regression it's not working. I get an error message:

Erreur dans lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  aucun cas ne contient autre chose que des valeurs manquantes (NA)

All my CSV file are numbers and if a "cell" is empty I have the "NA" value. Some column are not empty and some other row are sometimes empty witht the NA value...

So, I don't understand why I get an error message even with :

na.action=na.omit

PS:Data of the CSV are available at: http://pastebin.com/EWsLjKNN

Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
S12000
  • 3,345
  • 12
  • 35
  • 51

2 Answers2

6

You get this error message because all your data frame rows contain al least one missing value. It can be checked for example with this code:

 apply(data,1,function(x) sum(is.na(x)))
 [1] 128 126  82  78  73  65  58  34  31  30  28  30  20  21  12  20  17  16  12  42  50 128

So when you run regression wit lm() and na.action=na.omit all lines of data frame are removed and there are no data to fit regression.

But this is not the main problem. If your provided data contains all information you have, then you are trying to apply regression with 165 independent variables (X variables) while having only 22 observations. Number of independent variables have to be less than number of observations.

Didzis Elferts
  • 95,661
  • 14
  • 264
  • 201
  • Hello thanks for the answer if I understand I need two condition. First is to have more lines than columns. Second is to get no missing value. If there is one single missing value the model is not good. Is it what you mean ? – S12000 Dec 19 '12 at 18:50
  • @Swiss1200 You can have some missing values and their number will dependent on number of observation you have. But you have to check that number of complete observations (lines with no missing values) is greater than number of independent variables (columns) – Didzis Elferts Dec 19 '12 at 19:08
  • Can you explain what `apply(data,1,function(x) sum(is.na(x)))` does please? I have no `NA` in my data frame, but ` apply(data,1,function(x) sum(is.na(x)))` gave me `6 6 ... 6` – Jay Wang Sep 18 '17 at 23:09
  • @JayWong it calculates number of missing values. There should be missing some values if you get 6 – Didzis Elferts Sep 19 '17 at 04:59
-2

I believe I can add a little clarity to this since I personally experienced this, and that's why I am here-except my issue was with the gls (genearlized least squares model) vs. the standard linaer model. Some like logic "might" apply here-or in a like situation.

I don't refute anything that anyone has said thus far. There might be some confusion with what people percieve as an observation, and the way R percieves these things.

Say you have 160+ independent variables. Say you have a single given source in which all your data comes from. You import it from a file, database, etc. Say you have an identical amount of response variables or something that satisfies R for your purpose of regression analysis.

R will tell you that you have 2 observations. Now, if you have like data obtained in the same exact manner from another source, you have 3 observations if you look in RStudio at your global environment.

The reason I mention this is because the term "observation" in the mathematical sense (as it's being used here) is completely acceptable. In the terms of R, it views an observation in more ways than one.

THAT was a big contributor to a problem I had of like kind-and it told me I had values missing, na.omit this, na.action that, etc. WHen I looked at the OrchardSpray demo, and I reviewed my own methodologies, I figured it out.

The point being is that how we percieve an "observation" in datum is one thing. R has another term for it, and the way it spits out error messages can cause additional confusion.

See what I mean?