0
linear_regression <- function(variable1, variable2, dataset){

  
  variable1 <- match(variable1, colnames(dataset))
  variable2 <- match(variable2, colnames(dataset))

  variable1 <- dataset[,variable1]
  variable2 <- dataset[,variable2]

  
  lm_out <- lm(variable1 ~ variable2, data = dataset)
  return(summary(lm_out))
}

linear_regression('mpg', 'cyl', mtcars)

In the example above I would like to preform an operation on two variables within a dataset in a function. However in order to reference the variable in the function call I need to add the variable name in quotations and then before I can preform an operation I have to use that name to reference the column number within the dataset.

I am curious if there is a simpler way to refrence a column within a dataset within a function.

Liam Haller
  • 182
  • 10
  • 3
    What does *preform an operation on two variables within a dataset in a function* mean? Give an example of this. `reformulate(variable2, variable)` will return a formula that can be used by `lm`. – G. Grothendieck Mar 14 '23 at 14:56
  • 4
    I think this should answer your question: [How to use reference variables by character string in a formula?](https://stackoverflow.com/questions/17024685/how-to-use-reference-variables-by-character-string-in-a-formula) – Ian Campbell Mar 14 '23 at 14:57
  • `function(dv, iv, df) summary(lm(paste(dv, "~", iv), df))` – andrew_reece Mar 14 '23 at 15:08

1 Answers1

0

We can use reformulate():

linear_regression <- function(variable1, variable2, dataset){
  
  lm_out <- lm(reformulate(variable2, variable1), data = dataset)
  return(summary(lm_out))
}

linear_regression('mpg', 'cyl', mtcars)
#> 
#> Call:
#> lm(formula = reformulate(variable2, variable1), data = dataset)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.9814 -2.1185  0.2217  1.0717  7.5186 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  37.8846     2.0738   18.27  < 2e-16 ***
#> cyl          -2.8758     0.3224   -8.92 6.11e-10 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.206 on 30 degrees of freedom
#> Multiple R-squared:  0.7262, Adjusted R-squared:  0.7171 
#> F-statistic: 79.56 on 1 and 30 DF,  p-value: 6.113e-10

Created on 2023-03-14 with reprex v2.0.2

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39