The key to the solution is to call the filter()
and the select()
functions of dplyr
in such a way that keywords referring to variables are evaluated using SE (standard evaluation, i.e. where variables are given as strings (e.g. "sold_fee"
, as in outliers[,"sold_fee"]
in base R)), as opposed to NSE (non-standard evaluation, where variables are given as unquoted text (e.g. sold_fee
as in outliers$sold_fee
in base R))(*).
NSE is the default type of evaluation in functions defined in dplyr
, which makes referring to variables from a value stored in another variable (which is what you need in order to make the desired loop work as you want) not straightforward.
From the documentation for filter()
and select()
we deduce that the way to use SE in each of them differs, as follows:
In filter()
we should use the .data
pronoun. In your example, it would be:
v = "sold_fee"
filter(.data[[v]] > upr | .data[[v]] < lwr)
In select()
we should use the all_of()
functions. In your example it would be:
v = "sold_fee"
select(quotedate, factor, segment, all_of(v))
That said, you can now adapt your code so that the sold_fee
name is read from an array containing your analysis variables and loop on them. You would then use the above usage forms for filter()
and select()
to obtain what you want.
In a final note, notice that you could store the result of the data frame containing the columns you want to visualize in terms of the outliers in a list and then print all at once after the loop has finished, as in:
library(dplyr)
vars4analysis = c("sold_fee") # List all the variables you want to analyze for outliers here
outliers_info = list()
for (v in vars4analysis) {
outliers = ... # filter command here
outliers_info[[v]] = outliers %>% ... # select command here
}
print(outliers_info) # This will show the info about the outliers for each analysis variable
(*) You can read more about non-standard evaluation here: http://adv-r.had.co.nz/Computing-on-the-language.html