0

I have a matrix and a vector

set.seed(1) # I added this to have a reproducible values
X <- matrix(rexp(200, rate=.1), ncol=20)
Y <- matrix(rexp(10, rate=.1), ncol=1)

Then I randomly select 5 of the columns of X As @Laterow suggested # select 5 random columns from X temp <- sample(ncol(X), 5) X1 <- X[,temp]

Then I merge the X1 with my Y

mydata <- data.frame(cbind(Y,X1))

Then I build a regression

fit = lm(Y~.,data=mydata)

Then i obtain the sd

se <- sqrt(diag(vcov(fit)))

Now what I want to do is to change the column with the largest se with all other columns of my original X and keep the one with lowest se

for example if you run above, in the se, I have the X3 with biggest value

          X3  
7.348126e-18 

so I change the column 3 of X1 with all other columns except itself from the X

Now I want to automatically change column 3 with all other columns except itself from the X

if you do

> temp
#[1] 18  4  9  8 10

it the X1 column 3 changed by all columns of X except 9

nik
  • 2,500
  • 5
  • 21
  • 48
  • 1
    So... Why don't you just store the information, e.g. `temp <- sample(ncol(X), 5)`, so that you can later check which columns they were? (edit: and then obviously change the code to `X[,temp]`) – slamballais Oct 19 '16 at 10:47
  • @Laterow Thanks I edited my question now – nik Oct 19 '16 at 10:59

1 Answers1

0

It's really hard to understand what you want to achive

  1. "except 9" can't include 1 column twice into lm (without transormation) -collinearity. So you need to change X3 on all columns except already used.

But maybe that :

temp <- c(18 , 4 , 9,  8, 10) # your sample 
X1 <- X[,temp] 
mydata <- data.frame(cbind(Y,X1))
fit = lm(Y~.,data=mydata)
worst_se=which.max(summary(fit)$coefficients[-1,2]) # find max se without interceprt

Xm=X[,-temp] # all not used X

res2=lapply(1:ncol(Xm),function(i){
  mydata[[worst_se]] <- Xm[ , i]
  summary(lm(Y~.,data=mydata))$coefficients[names(worst_se),"Std. Error"] # return se of changed X3
})

You may be have to see about ?step which used to achive "best" model or here ( you task very similar with it)

PS

Highest se not means worst coef. ( there is some test to check significants of coef in statistics)

Community
  • 1
  • 1
Batanichek
  • 7,761
  • 31
  • 49
  • would it be possible to do it automatically from the beginning? before the temp is calculated ? also which type of test when you say "there is some test to check significants of coef in statistics" – nik Oct 19 '16 at 13:33
  • Are you read about `step` and what it do? `t-test` for example can be used to check significants. (`p-value` show significants of coef). you need to told by words what you wnat to achive ( find best model with 5 coef from matrix?( what about collinearity and others stats issuse?)) – Batanichek Oct 19 '16 at 13:39
  • I want to find which variable among those 5 are less important then i change it with X one by one and keep a better one instead – nik Oct 19 '16 at 13:53
  • at this stage I don't care about collinearity , it is an issue when you check for one independent variable but does not harm the entire model – nik Oct 19 '16 at 13:54