0

Here is a challenging yet quiet interesting conflict I have... I wish I could really do this, hope someone could help me out here

THIS is "R"

Here is a code that I am running to get R-squared for X1 from training

model=randomForest(X1~.,data=training,importance=TRUE,keep.forest=TRUE)
predicted=predict(model,newdata=testing[,-1])
actual=testing$X1
rsq=1-sum((actual-predicted)^2)/sum((actual-mean(actual))^2)
print(rsq)

Here is head of training to let you know how it looks like

head(training)
        X1        X2        X3         X4        X5         X6
68   -3.556526  4.588409 -2.756521  -2.742035 11.542023 -18.405807
23   -1.915947 -0.179710 -0.240580  -0.278259 -0.284058   0.553627
129 -24.252174 -4.869564  4.800001 -14.608688  5.255074 -20.228981
5    -1.637680 -1.147827 -2.005795  -1.121750  0.101440  -1.608688
147 -68.289856 -0.626083 19.933334  -6.637680 15.379715 -11.515945

There are up to X77 one of these where as the number of rows is 73.

My objective is to make a loop of

model=randomForest(X1~.,data=training,importance=TRUE,keep.forest=TRUE)
predicted=predict(model,newdata=testing[,-1])
actual=testing$X1
rsq=1-sum((actual-predicted)^2)/sum((actual-mean(actual))^2)
print(rsq)

up to X77

so up to

model=randomForest(X77~.,data=training,importance=TRUE,keep.forest=TRUE)
predicted=predict(model,newdata=testing[,-77])
actual=testing$X77
rsq=1-sum((actual-predicted)^2)/sum((actual-mean(actual))^2)
print(rsq)

so i could achieve 77 of r-sqared

my final objective is just to take mean of those 77 r-squared


to Maxim.K and others

rsq=function(i){
model=randomForest(testing[,1]~.,data=training,importance=TRUE,keep.forest=TRUE)
predicted=predict(model,newdata=testing[,-i])
actual=testing[,i]
1-sum((actual-predicted)^2)/sum((actual-mean(actual))^2)
}

rsq=function(i){
model=randomForest(Xi~.,data=training,importance=TRUE,keep.forest=TRUE)
predicted=predict(model,newdata=testing[,-i])
actual=testing[,i]
1-sum((actual-predicted)^2)/sum((actual-mean(actual))^2)
}

I know that second one logically makes no sense, but that is what I need. So testing$X1 is testing[,1], but it won't work putting testing[,1] I have to put it in a form of "X1"

How would I do that...?

user2201675
  • 125
  • 1
  • 6
  • I don't find the question very well formulated. You should probably specify what you have tried already, and what has failed. Otherwise it's just asking for free programming services imo. The solution is easy anyway, make a function out of the code you quote, using X.n as the argument for that function. Then use *apply. – Maxim.K Apr 18 '13 at 10:12
  • see the edit. my codes didn't work that's why i didn't share but okay. – user2201675 Apr 18 '13 at 22:03

1 Answers1

2

I think Maxim.K has alluded to this already, but something like this would work

rsq = function(i) {
  n = colnames(testing)[i]
  model=randomForest(as.formula(paste(n,"~.")),data=training,importance=TRUE,keep.forest=TRUE)
  predicted=predict(model,newdata=testing[,-i])
  actual=testing[[n]]
  1-sum((actual-predicted)^2)/sum((actual-mean(actual))^2)
}
sapply(1:77, rsq)
waferthin
  • 1,582
  • 1
  • 16
  • 27
  • 1
    I am not sure whether line 3 is correct. In the original example it varies, whereas in your function the prediction occurs based on the same data (albeit on a different model). – Maxim.K Apr 18 '13 at 11:40
  • The third line should probably be `as.formula(paste(n,"~."))`, unless `randomForest` works differently than usual model building functions such as `lm`. See http://stackoverflow.com/q/7666807/210673 for an example, as well as the many linked questions. – Aaron left Stack Overflow Apr 18 '13 at 15:35