-1

Overall goal is to call lm(cbind(data$response1, data$response2) ~., data = data). When I use the $, the subsequent call to ~. adds all variables in data as predictors but excludes "response1" and "response2".

I would be very grateful if anyone can help me figure out how to create a function that takes a data frame and a variable name and prints that variable. For example:

(expected output)

    create.vector <- function(data, variable.name) {
        return(data$variable.name)
    }
    data <- iris
    head(
    create.vector(iris, "Species")
    )
[1] setosa setosa setosa setosa setosa setosa
Levels: setosa versicolor virginica

I have tried to input the line paste(data, variable.name, collapse = "$"), but the output seems to remain of data type character...

Community
  • 1
  • 1
Dij
  • 1,318
  • 1
  • 7
  • 13

3 Answers3

1

This is looking for a column literally called variable.name and there is no such column in iris.

For example, note that column variable.name, not the column called Species is returned here:

create.vector <- function(data, variable.name) {
        return(data$variable.name)
    }

DF <- data.frame(variable.name = 1, Species = 2)
create.vector(DF, "Species")
## [1] 1

What you want is:

return(data[[variable.name]])

Be sure to use double square brackets as single square brackets will return a data.frame with one column rather than a vector which is what it seems you want.

Also, although not wrong, we don't need the return keyword since a function always returns the last line in it that is executed so the return line could be just:

data[[variable.name]]
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • It may be worth noting that the `[[` and `$` can be used as function calls when enclosed in back ticks. I think that helped me improve my understanding of what was going on (e.g., `\`[[\`(iris, "Species")` and `\`$\`(iris, "Species")` – zack May 09 '19 at 19:17
  • Thank you for this suggestion! The issue is that I seek to then call: `lm(cbind(iris$Sepal.Width, iris$Sepal.Length) ~., data = iris)`. When I use the `$`, the lm function knows to exclude `Sepal.Width` and `Sepal.length` as X variables specified by the `~.`. But the double brackets doesn't seem to solve that... – Dij May 09 '19 at 19:19
  • 1
    The answer responds to your question. The problem in the comment is really different and you don't need square brackets or $ for that. `f <- function(data, ...) { fo <- sprintf("cbind(%s) ~.", paste(..., sep = ",")); lm(fo, data) }; f(iris, "Sepal.Length", "Sepal.Width")` – G. Grothendieck May 09 '19 at 19:33
  • Thank you, this last comment was very helpful actually and worked! – Dij May 09 '19 at 19:42
0

Update per OP comments
Here's how to pass in two column names for the call to lm():

create.vector <- function(data, v1, v2) {
  lm(cbind(data[[v1]], data[[v2]]) ~ ., data[,!names(data) %in% c(v1,v2)])
}
create.vector(iris, "Sepal.Width", "Sepal.Length")

Output:

Call:
lm(formula = cbind(data[[v1]], data[[v2]]) ~ ., data = data[, 
    !names(data) %in% c(v1, v2)])

Coefficients:
                   [,1]       [,2]     
(Intercept)         3.048497   3.682982
Petal.Length        0.154676   0.905946
Petal.Width         0.623446  -0.005995
Speciesversicolor  -1.764104  -1.598362
Speciesvirginica   -2.196357  -2.112647

Original Square-bracket indexing df[[col]] is an alternative to indexing into a data frame with the dollar-sign ($) notation. Here's a short review of accessors in R.

create.vector <- function(data, variable.name) data[[variable.name]]

create.vector(iris, "Species")
       Species
1       setosa
2       setosa
3       setosa
...
andrew_reece
  • 20,390
  • 3
  • 33
  • 58
  • Thank you for this suggestion! The issue is that I seek to then call: `lm(cbind(iris$Sepal.Width, iris$Sepal.Length) ~., data = iris)`. When I use the `$`, the lm function knows to exclude `Sepal.Width` and `Sepal.length` as X variables specified by the `~.`. But the double brackets doesn't seem to solve that... – Dij May 09 '19 at 19:19
  • hi, see updated solution - the double-bracket approach seems to work. what happens when you try it? – andrew_reece May 09 '19 at 19:25
  • Hi, thank you again for updating your suggestion, but if you note in the output of the lm summary, sepal.width and sepal. length are included as predictors in addition to being V1 and V2 (response variables). I do not want to predict those variables with themselves... – Dij May 09 '19 at 19:26
  • 1
    This is just an indexing issue - remove the columns from the `data` argument. I've updated with one way to do this. – andrew_reece May 09 '19 at 19:30
0

There's a builtin, getElement

getElement(object=iris, name="Species")
# [1] setosa     setosa     setosa     setosa     setosa     setosa    
# [7] setosa     setosa     setosa     setosa     setosa     setosa    
# [13] setosa     setosa     setosa     setosa     setosa     setosa    
# [19] setosa     setosa     setosa     setosa     setosa     setosa    
# [25] setosa     setosa     setosa     setosa     setosa     setosa    
# [31] setosa     setosa     setosa     setosa     setosa     setosa    
# [37] setosa     setosa     setosa     setosa     setosa     setosa    
# [43] setosa     setosa     setosa     setosa     setosa     setosa    
# [49] setosa     setosa     versicolor versicolor versicolor versicolor
# [55] versicolor versicolor versicolor versicolor versicolor versicolor
# [61] ...
jay.sf
  • 60,139
  • 8
  • 53
  • 110