1

I am trying to run a linear regression on weighted data.
When using speedlm i get an error msg when there are missing values in the data.

 library(speedglm)
 sampleData <- data.frame(w = round(runif(12,0,1)),
                          target = rnorm(12,100,50),
                          predictor = c(NA, rnorm(10, 40, 10),NA))

 summary(sampleData)
       w              target          predictor    
 Min.   :0.0000   Min.   : -3.381   Min.   :22.58  
 1st Qu.:0.0000   1st Qu.: 48.321   1st Qu.:30.45  
 Median :1.0000   Median : 84.156   Median :37.09  
 Mean   :0.5833   Mean   : 92.306   Mean   :35.03  
 3rd Qu.:1.0000   3rd Qu.:119.891   3rd Qu.:41.96  
 Max.   :1.0000   Max.   :223.896   Max.   :43.48  
                                    NA's   :2
 #run linear regression without weights
 linearNoWeights <- lm(formula("target~predictor"), data = sampleData)
 speedLinearNoWeights <- speedlm(formula("target~predictor"), data = sampleData)

 #run linear regression with weights
 linearWithWeights <- lm(formula("target~predictor"), data = sampleData, weights =sampleData[,"w"] )
 speedLinearWithWheights <- speedlm(formula("target~predictor"), data = sampleData, weights =sampleData[,"w"] )
Error in base::crossprod(x, y) : non-conformable arguments
In addition: Warning messages:
1: In sqw * X :
  longer object length is not a multiple of shorter object length
2: In sqw * y :
  longer object length is not a multiple of shorter object length
Called from: base::crossprod(x, y)

Is there any way around this that does not force me to fix the data before running the regression?

zx8754
  • 52,746
  • 12
  • 114
  • 209
eliavs
  • 2,306
  • 4
  • 23
  • 33
  • 2
    Why are you opposed to removing these two observations from the dataset prior to fitting the model? – Roland Nov 22 '16 at 08:19
  • @Roland what I showed here is an example I have actually many data frames and the NA's are important for the rest of the calculations – eliavs Nov 22 '16 at 08:21

1 Answers1

1

You should try to change the na.action option. Below is your code, which I am able to run, when I change na.action to na.exclude/na.omit.

library(speedglm)
sampleData <- data.frame(w = round(runif(12,0,1)),
                         target = rnorm(12,100,50),
                         predictor = c(NA, rnorm(10, 40, 10),NA))
summary(sampleData)

linearNoWeights <- lm(formula("target~predictor"), data = sampleData)
speedLinearNoWeights <- speedlm(formula("target~predictor"), data = sampleData)

options(na.action="na.exclude") # or "na.omit"

linearNoWeights <- lm(formula("target~predictor"), data = sampleData)
    speedLinearNoWeights <- speedlm(formula("target~predictor"), data = sampleData)

You can go through the documentation for na.omit or na.exclude to understand when to use what. Hope this helps.

Kumar Manglam
  • 2,780
  • 1
  • 19
  • 28