0

I am struggling with my code and I can't figure out which is the actual problem.

Just to give you a bit of context: I am trying to write some code which will help me in performing automatic EDA using the ggstatsplot. I would like to select a target variable in my dataset and on the basis of this, the program has to loop over the remaining columns perfoming different bivariate analysis, depending on the type of variable (it has to use the ggscatterstats if both are numerical, ggbetweenstats if one is a factor and the other is numerical and ggbarstats if both are factors). I am attaching a short db I am using for the experimentations.

how the dataset looks like

The code I am using is the following (let's suppose our target is Upselling hence the code should only procude ggbetweenstats and ggbarstats plots):

library(ggstatsplot)
df <- dataset
target_var <- dataset$Upselling
for (var in 1:ncol(df)) {
if (is.numeric(df[[var]]) && is.numeric(target_var)) {
plots <- ggscatterstats(data = df, x = var, y = target_var)} 
else if (is.numeric(df[[var]]) && is.factor(target_var) || is.factor(df[[var]]) && is.numeric(target_var)) {
plots <- ggbetweenstats(data = df, x = var, y = target_var)} 
else {plots <- ggbarstats(data = df, x = var, y = target_var)}
print(plots)
}

The error I am getting is the following:

Error in select(): ! Can't subset columns that don't exist. ✖ Column var doesn't exist.

Could you please help? Thank you so much

  • Welcome to SO! It would be easier to help you if you provide [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a snippet of your data or some fake data. To share your data, you could type `dput(NAME_OF_DATASET)` into the console and copy & paste the output starting with `structure(....` into your post. If your dataset has a lot of observations you could do e.g. `dput(head(NAME_OF_DATASET, 10))` for the first ten rows of data. – stefan Jan 12 '23 at 08:52
  • This said, the issue is most likely that `ggstatsplot` uses non-standard evaluation and expects unquoted column names as arguments whereas you provide a `vector` (target_var) or just the position of the column (var). – stefan Jan 12 '23 at 08:54
  • 1
    Have a look at FAQ#6 (https://indrajeetpatil.github.io/ggstatsplot/articles/web_only/faq.html#how-can-i-use-ggstatsplot-functions-in-a-for-loop) and see if that solves your issue. – Indrajeet Patil Jan 12 '23 at 08:57
  • I thin this is an oft duplicated Q. Suspect something like `rlang:ensym(!!var)` will solve it. – IRTFM Jan 12 '23 at 09:52

0 Answers0