0

I've checked this question about defining models in R. What I would like to do is swap variables in the model in a for loop so that every variable is one time the target variable and all other variables are the predicting variables for that iteration.

df <- data.frame(customer = c(1:5), product1 = c(1,0,1,1,0), product2 = c(0,1,0,0,1), product3 = c(0,1,1,1,0))

customer product1 product2 product3
1        1        1        0        0
2        2        0        1        1
3        3        1        0        1
4        4        1        0        1
5        5        0        1        0

So then I would like to create a for loop with 3 iterations in this case:

mdl <- product1 ~ product2 + product3
mdl <- product2 ~ product1 + product3
mdl <- product3 ~ product1 + product2

To clarify my question here my attempt to create this for loop:

  for(j in 1:ncol(df)){
    mdl <- df$[j] ~ df[-j] # include all variables except target variable
    print(mdl)
  }

Here the output I got returned:

df[j] ~ df[-j]
df[j] ~ df[-j]
df[j] ~ df[-j]
df[j] ~ df[-j]

While I expected something such as desired output:

product1 ~ product2 + product3
product2 ~ product1 + product3
product3 ~ product1 + product2

If you wonder why I would like to know this. I want to use it in a for loop that runs a prediction model as in this example:

naiveBayes(mdl, df, type = "raw")

I hope that my question is clear and hopefully anyone could help me out.

Community
  • 1
  • 1
Floris
  • 11
  • 7
  • I don't understand how you get that desired output from the input data.frame. – MrFlick Mar 06 '17 at 16:31
  • Is this similar to what you want? http://stackoverflow.com/questions/5300595/automatically-create-formulas-for-all-possible-linear-models – MrFlick Mar 06 '17 at 16:32
  • @MrFlick Thanks for the link, it is close to what I was looking for. Though, the difference is that I want to have all possible combinations including all variables. So with Column product1, product2 and product3 there are only 3 unique combinations as shown above rather than 9. – Floris Mar 06 '17 at 16:46
  • So I would like it to take 1 column as target variable every iteration in the for loop and all other columns as predicting variables. – Floris Mar 06 '17 at 16:48
  • Don't edit your question to add a new, different question. This one has been answered. If you now have a different question, start a new post on this site. – MrFlick Mar 06 '17 at 19:46

1 Answers1

0

With setdiff and lapply you could achieve the formula combinations.

varNames = colnames(DF)[-1]


lapply(varNames, function(x) paste0( x ,"~", paste0(setdiff(varNames,x),collapse="+" ) ) )
#[[1]]
#[1] "product1~product2+product3"
#
#[[2]]
#[1] "product2~product1+product3"
#
#[[3]]
#[1] "product3~product1+product2"

To incorporate these into your model, you could do:

modelList = lapply(varNames,function(x) {

depVar = x
indepVar = setdiff(varNames,x)

formulaVar = as.formula(paste0( depVar ,"~", paste0(indepVar,collapse="+" ) ))

nbModel = naiveBayes(formulaVar, df, type = "raw")

outputList = list( indepVar = paste0(indepVar,collapse=","),depVar = depVar,nbModel = nbModel)

return(outputList)

})

This will return a list object containing dependent vars, independent var and finally the Naive Bayes model.

To access anyone of these you could modelList[[1]], length(modelList) gives the number of models in this list.

For alternative methods to generate combinations, see ?combn and ?expand.grid

combn(varNames,2)
#     [,1]       [,2]       [,3]      
#[1,] "product1" "product1" "product2"
#[2,] "product2" "product3" "product3"

head(expand.grid(varNames,varNames,varNames))
#      Var1     Var2     Var3
#1 product1 product1 product1
#2 product2 product1 product1
#3 product3 product1 product1
#4 product1 product2 product1
#5 product2 product2 product1
#6 product3 product2 product1
Silence Dogood
  • 3,587
  • 1
  • 13
  • 17
  • You provided me with exactly what I was looking for so thanks a lot for the great answer! Though, the final step for me is to have a dataframe with the likelihood that a customer is going to buy each product. So for each customer with id 1 till 5 as shown in the example data set the likelihood/probability the value is a 1 (which means buy in this example data set). Great great answer, though still difficult for me to incorporate it in the context I want to use it for. In the model I use the dataset as the training set as well as the test set. – Floris Mar 06 '17 at 17:36
  • I've edited my question to notify other people that you've given the right answer and that I added a next related question in the comment. So in the edit I've added the desired output of this extra question. – Floris Mar 06 '17 at 17:43
  • Could you also include sample output of one of the nbModel and how this result relates to your desired output – Silence Dogood Mar 06 '17 at 17:55
  • I've edited my question again and included the sample output you asked for. I couldn't find any question on this platform about how to run a naiveBayes model on multiple combinations of models so again an answer to this extra question would be great! – Floris Mar 06 '17 at 19:35
  • I've created a new question for this where I've included the sample output you asked for as well. Hopefully you could find the time to have a look at it. http://stackoverflow.com/questions/42635930/run-naivebayes-model-on-columns-as-well-as-rows-in-a-for-loop – Floris Mar 06 '17 at 21:32