-4

Suppose that I have data in order to run many linear regression model.

Data: https://www.img.in.th/image/TNHdEq

Given column C1 is y variable.

x variable is column C4 by create from column C2 and C3, Model1 is created by first row of column C2 and 8 rows remaining of column C3, Model2 is created by first 2 rows of column C2 and 7 rows remaining C3, Then to Model9 is created by first 8 rows of C2 and last row of C3.

Example x variable:

model1 : { b, d, i,...,z}

model2 : { b, f, i,..., z}

.

.

.

model9 : {b, f, h,..., z}

And select models by maximum R squared.

Question: How to code for it? loop?

Using both R and python.

Ps.Really, I use ordered probit model.And I have many rows 100+.

Thank you.

nitishagar
  • 9,038
  • 3
  • 28
  • 40
  • 4
    See [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) how to ask a good question. And see [here](https://r4ds.had.co.nz/many-models.html) how to run multiple models in r –  Apr 16 '20 at 11:08

1 Answers1

0

To run many models can be done with *apply loops and the results output to a list object. In this case the loop variable will be the row number i, varying from 1 to nrow(df1) - 1.

n <- nrow(df1)
probit_list <- lapply(seq.int(n)[-n], function(i){
  C4 <- c(df1$C2[seq.int(i)], df1$C3[-seq.int(i)])
  C4 <- ordered(C4, levels = levels(df1$C2))
  dftmp <- data.frame(C1 = df1$C1, C4)
  tryCatch(glm(C1 ~ C4, data = dftmp, family = binomial(link = "probit")),
           error = function(e) e)
})

To see how many gave error run

ok <- sapply(probit_list, inherits, "error")
sum(!ok)

Test data

set.seed(1234)
n <- 9
df1 <- data.frame(
  C1 = rbinom(n, 1, prob = c(0.4, 0.6)),
  C2 = ordered(sample(1:4, n, TRUE), levels = 1:4),
  C3 = ordered(sample(1:4, n, TRUE), levels = 1:4)
)
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • @NuttapongKumpanich What kind of problem? If you can you give more information I might have some idea on how to solve it. – Rui Barradas Apr 20 '20 at 18:04
  • I found a problem,I use your code and adapt for mydata. https://www.img.in.th/image/TeXmEM . Data: https://drive.google.com/file/d/1n1oy47j0DLIqoX1HhSP5iRGqy7yGNg4S/view?usp=sharing – Nuttapong Kumpanich Apr 20 '20 at 18:26
  • In row "probit_list" occured "Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels " – Nuttapong Kumpanich Apr 20 '20 at 18:29
  • @NuttapongKumpanich That means that your data has only one unique value. There's no point in running a regression if the regressor doesn't vary. Anyway, I have edited with error trapping code. See now if it works. – Rui Barradas Apr 20 '20 at 18:41
  • It doesn't have error but this -summary(probit_list)- doesn't have output about model. – Nuttapong Kumpanich Apr 20 '20 at 19:13
  • @NuttapongKumpanich That means that there are no 2 or more levels in the regressors. To revise what you are doing seems to be the only solution. I have added code to see how many regressions are not OK. – Rui Barradas Apr 20 '20 at 19:53
  • The code to see how many regressions has error that -sapply(probit_list, inherits) == "error"- is "Error in FUN(X[[i]], ...) : argument "what" is missing, with no default" – Nuttapong Kumpanich Apr 20 '20 at 20:27
  • @NuttapongKumpanich Sorry, see it now. – Rui Barradas Apr 20 '20 at 21:52
  • The result is a 0. – Nuttapong Kumpanich Apr 21 '20 at 02:58