0

Hello i'm building an R function and would like to understand why one works and the other does not.

This one does not work

isola_por_fator_em_col <- function(data,col,fator)
{

  y <- data[which(data$col==fator),]
  
  x <- select_if(y,is.numeric)
  
  summary(x)
  
}

isola_por_fator_em_col(data=desempenho_aluno_escola,col=priv,fator="privada")


Warning message:
Unknown or uninitialised column: `col`.

It also does not work when i type this

isola_por_fator_em_col(data=desempenho_aluno_escola,col="priv",fator="privada")

This one works

isola_por_fator_em_col <- function(data,col,fator)
{
  y <- data[which(data[col]==fator),]

  x <- select_if(y,is.numeric)
  
  summary(x)
}

isola_por_fator_em_col(data=desempenho_aluno_escola,col="priv",fator="privada")

   desempenho         horas            texp     
 Min.   : 11.40   Min.   : 4.00   Min.   : 9.0  
 1st Qu.: 51.42   1st Qu.:16.00   1st Qu.: 9.0  
 Median : 67.45   Median :21.00   Median :10.0  
 Mean   : 66.55   Mean   :20.06   Mean   :13.3  
 3rd Qu.: 82.47   3rd Qu.:25.00   3rd Qu.:19.0  
 Max.   :108.00   Max.   :31.00   Max.   :20.0 

Basically what is the difference between $ and [] in R. When i call the $ data$priv OUTSIDE the function it returns the column with no problem.

I think the [] returns the COLUMN while the $ returns the values, but i dont understand why comparing the values in the function would not work.

if i call

desempenho_aluno_escola[which(desempenho_aluno_escola$priv=="privada"),]

Outside the function it works normally

1 Answers1

0

The $ operator uses non-standard evaluation to capture the name as typed after the $, so data$col, is never substituted for data$priv. That is, inside your function, data$col is always interpreted as data[['col']] and not data[['priv']], which is what causes the error. If you want to pass unquoted column names, there are various ways around this. For example:

isola_por_fator_em_col <- function(data, col, fator)
{
  col <- deparse(substitute(col))

  summary(dplyr::select_if(data[data[[col]] == fator,], is.numeric))
}

Which gives you:

isola_por_fator_em_col(iris, Species, 'setosa')
#> Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
#>Min.   :4.300   Min.   :2.300   Min.   :1.000   Min.   :0.100  
#>1st Qu.:4.800   1st Qu.:3.200   1st Qu.:1.400   1st Qu.:0.200  
#>Median :5.000   Median :3.400   Median :1.500   Median :0.200  
#>Mean   :5.006   Mean   :3.428   Mean   :1.462   Mean   :0.246  
#>3rd Qu.:5.200   3rd Qu.:3.675   3rd Qu.:1.575   3rd Qu.:0.300  
#>Max.   :5.800   Max.   :4.400   Max.   :1.900   Max.   :0.600 
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87