Predicting 3 Dependent Outcomes in r Not Working

Question

I have a rather basic model that will try to predict the volume of one stock the next day. However, I'd like to predict all three stocks. So instead of one outcome, there's three.

outcomeSymbol <- cbind('AAPL.Volume','ADBE.Volume','ADI.Volume')

Here's what the head of the outcomes looks like (dates in random order):

Here is the training that works fine with one outcome variable ( outcomeSymbol <- 'AAPL.Volume'):

bst <- train(train[,predictorNames],  as.factor(train$outcome),
             method='gbm'
)

But when run this with the 3 outcome variables, I get:
Error: nrow(x) == n is not TRUE

Do I have to use different parameters or a different model if there is more than one outcome?

The entire code, so you can see everything and run it yourself: https://gist.github.com/alteredorange/b97481ed7e00b33bab0d28dcdd7d0e4a

This sounds like a question about data modeling which is off-topic for this site. Such questions belong on [stats.se]. If it really is about programming, include a minimal [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in the question itself. — MrFlick, Mar 13 '17 at 15:57
@MrFlick The code is kind of long, which is why I included the gist link. Should I include the whole code in the question? — Alteredorange, Mar 13 '17 at 17:15
No. You should recreate only what's necessary to make your problem clear. We are here to answer a specific question, not to go through your entire analysis. But that's only if this is actually a programming question which it doesn't seem like to me. — MrFlick, Mar 13 '17 at 17:20
@MrFlick the specific question is how to model three outcomes in r. The error I get is provided (Error: nrow(x) == n is not TRUE). I'll add the train portion to the question as well. — Alteredorange, Mar 13 '17 at 17:52
It seems like your question is more along the lines of *"what statistical method (with an R implementation) can I use to simultaneously model 3 dependent variables"* - which is not a specific programming problem. You seem to assume that `gbm` works with multiple dependent variables, but I see nothing in the documentation to suggest that is true. I think you should try either the Data Science or Statistics stack exchange sites. — Gregor Thomas, Mar 13 '17 at 18:20
@Gregor True! I'll give those two places a shot. I've asked in other modeling forums more abstractly and they say it's a programming problem. So I try to make it more specific and ask here and it's called a modelling problem :) I'll keep banging away at it, thanks! — Alteredorange, Mar 13 '17 at 20:38

Sandipan Dey · Answer 1 · 2017-03-15T06:52:41.320

1

You need to change the code in the following way (from line #63 to line #78):

set.seed(1234)
split <- sample(nrow(nasdaq100), floor(0.7*nrow(nasdaq100)))

# process the outcome variables for the entire data 
nasdaq100$outcome <- ifelse(nasdaq100$outcome==1,'yes','nope')
nasdaq100$outcome <- sapply(as.data.frame(nasdaq100$outcome), function(x) as.factor(x)) 

train <-nasdaq100[split,]
test <- nasdaq100[-split,]

# learn 3 different models, one for each outcome variable
bst <- lapply(1:3, function(i) train(train[,predictorNames],train$outcome[,i],method='gbm'))

# compute ROC separately for 3 of the models
library(pROC)
auc <- lapply(1:3, function(i) {
  predictions <- predict(object=bst[[i]], test[,predictorNames], type='prob')
  auc(test$outcome[,i],predictions[,2])
})

# auc scores for 3 models
print(paste('AUC score:', auc)) 
# [1] "AUC score: 0.662664263875109" "AUC score: 0.698058147615867" "AUC score: 0.719709083058406"

edited Mar 15 '17 at 06:52

answered Mar 13 '17 at 18:37

Sandipan Dey

21,482
2
51
63

It'd be helpful to say which lines you changed, and why. – Darren Cook Mar 13 '17 at 20:59
I think this is getting close! But it looks like this might just be batching (making 3 models and 3 predictions) rather than actually making one model, no? As @Gregor mentioned I might need a different model entirely :/ – Alteredorange Mar 13 '17 at 23:12
@DarrenCook The lines changed have comments along with them. – Sandipan Dey Mar 15 '17 at 06:52

Predicting 3 Dependent Outcomes in r Not Working

1 Answers1