-1

Im new to R, and im a bit confused!

I'm trying to get the column names of a CSV and then iterate over them and use them as a key in an linear model function.

Ive been getting errors when trying to do this by getting the column names like so:

columns <- as.list(VBPersonasMulti[0,2:length(VBPersonasMulti)])

and then referencing these as keys in the ml function

for (i in seq_along(column)) {
    anal <- lm(open ~ unlist(column[i]), data = VBPersonasMulti)
}

I have tried without unlist and several other functions and also column[[i]]) a solution to the above method would be ideal, but i am also having problems with a less dynamic version of this iteration

which is creating a fixed list of the column names i really wanted to iterate over from the csv (or reassignment of that column) :

colnames <- list('attempted','open', 'completed', 'attempted', 'earned', 'commented', 'X7'
               , 'logout', 'join', 'leave', 'flag_as_inaproppriate')
for (i in seq_along(colnames)) {
  print(colnames[i])
  anal <- lm(open ~ unlist(colnames[i]), data = VBPersonasMulti)
  plot(anal)
}

but when the code tries to use the member of the list as a key in its lm function i get this error:

Error in model.frame.default(formula = open ~ unlist(colname[i]), data = VBPersonasMulti, : variable lengths differ (found for 'unlist(colname[i])')

if i try to access the column name using colname[i] or colname[[i]] i get the error:

invalid type (list) for variable 'colname[i]'

Sorry for the newbie question and if i've struggled to describe the problem accurately.

What I would like to happen is that for each column name the lm function will run using the column name as the second argument to lm

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Happy Machine
  • 987
  • 8
  • 30
  • 1
    Why do you feel you need to use a list? Also the problem is that your names are strings but you want them as variables ... so one approach is to build the formula using paste0() and then use as.formula() to make it usable in lm. – Elin Feb 02 '19 at 00:34
  • 3
    Here's one example of a [question](https://stackoverflow.com/q/22955617/324364) dealing with similar issues, there are many others. The main misconception you're having is that `lm`'s first argument is a _formula_. Those are special objects in R, often created from _bare symbols_ that are column _names_ not the columns of data themselves. If you have the column names as characters, that's different and you need to _construct_ a formula using `paste()` and `as.formula()`. – joran Feb 02 '19 at 00:36
  • 1
    For example, a formula might be `variable1 ~ variable2`, but if we have `x <- "variable2"`, we _cannot_ do `variable1 ~ x`. Instead we have to build the formula. – joran Feb 02 '19 at 00:38
  • As a side note, creating a list with `list()` is typically most useful if the objects in it may be of different types, a mix of characters, numeric or more complex objects like another list. If you're collecting things that are all the same atomic type, like strings, just use `c()`. – joran Feb 02 '19 at 00:39
  • You may also be interested in the `formulate` function which allows construction of formulas from character vectors. As joran mentions, plenty of similar questions abound. Some of the answers use `reformulate`. – lmo Feb 02 '19 at 01:16
  • 1
    Your problem is not *"Iterating over list"*, that's an [XY problem](https://meta.stackoverflow.com/questions/tagged/xy-problem). It's *"construct formula from specified variables"*. And you don't even need to use the variable names, you can use the columns of dataframe/matrix. So this is a duplicate of existing questions. – smci Feb 02 '19 at 01:25

1 Answers1

1

Use formula() inside the lm() function to use formulas written as strings. And don't bother using a list. E.g.:

VBPersonasMulti = data.frame(open = rnorm(100, 3, 2),
  attempted = rnorm(100,1,2),
  completed = rnorm(100,-1,3))

colnames <- c('attempted','completed')

for (colname in colnames) {
  print(colname)
  anal <- lm(formula(paste('open ~' , colname)), data = VBPersonasMulti)
  plot(anal)
}
tofd
  • 620
  • 4
  • 11