0

I have a dataset containing answers to a survey (q1:q4), alongside characteristics of respondents (Project, Level).

data <- data.frame(Project = c(paste0("P", sample(1:3, 10, replace = TRUE))),
                Level = c(sample(1:3, 10, replace = TRUE)),
                q1 = c(sample(1:10, 10, replace = TRUE)),
                q2 = c(sample(1:10, 10, replace = TRUE)),
                q3 = c(sample(1:10, 10, replace = TRUE)),
                q4 = c(sample(1:10, 10, replace = TRUE))
                )

I would like to create nice-looking scatterplots using ggscatterplot showing the correlation between q1 and the other three questions grouping respondents by level and by project.

I have developed this function:

var_look2 <- function(data) {
  var_names <- data %>% select(q1:q4) %>% colnames()
  levels <- c(1:3)
  projects <-   unique(data$Project)
  
  df_cor <- data %>% mutate_if(is.character, as.factor)
  df_cor <- df_cor %>% mutate_if(is.factor, as.numeric)
  
  for(var in var_names) {
    for (level in levels) {
      data_subset <- subset(df_cor, Level == 1)
      
      for(project in projects) {
        data_subset <- subset(df_cor, Project == project)
        
        n <- nrow(data_subset)
        
          p<- ggscatterstats(
          data = data_subset,
          type = "non-parametric",
          x = {{var}},
          y = q1,
          bf.message = FALSE, 
          title = paste(paste(project, "scatterplot level",  level, "N =", n)),
          marginal = TRUE
        ) 
        
        
        ggsave(filename = paste0(project, " ", var, " ", level, " ", " .jpeg"), plot = p, 
               width = 1000, height = 1000, units = "px", scale = 1)
      }
    }
  }
}

Problem 1: When run var_look2(data) I get the following output:

> var_look2(data)
# Error:
# ! Problem while setting up layer.
# ℹ Error occurred in the 3rd layer.
# Caused by error in `$<-.data.frame`:
# ! replacement has 1 row, data has 0
# Run `rlang::last_trace()` to see where the error occurred.

After turning on and off all the loops, I figured that the problem is with this line:

data_subset <- subset(df_cor, Project == project)

as this line generates an empty data_subset. Any ideas?

Problem 2: If I remove the line data_subset <- subset(df_cor, Project == project) ggsave does what I expect.

However, what I actually want is to be able to plot these scatterplots grouped by level and/or project to allow readers to do immediate comparisons.

In order to do this, instead of having the ggsave command at the end, I would like to create a list containing all the plots named appropriately so that I can eventually feed to ggplot. i tried with this command

    p<- ggscatterstats(
      data = data_subset,
      type = "non-parametric",
      x = {{var}},
      y = q1,
      bf.message = FALSE, 
      title = paste((project, " ", {{var}}, " scatterplot level ",  level, "N =", n)),
      marginal = TRUE) 

plot_list[[paste0(project, " ", var, " ", level)]] <- p

However, if i run the command:

plot_list <- var_look(GES_BGD1),

what I get is that plot_list is a NULL object.

I was expecting that plot_list would contain all the scatterplots as described above. This is weird to me, because the ggsave command does save the scatterplot, so the ggscatterplot command is not the issue.

edodar
  • 1
  • 2
  • Can you please edit your code to make it reproducible? There are objects called (e.g `mission`) that haven't been defined and it's not clear what the difference is between `var_look` and `var_look2` as you only provided code for one of them. – nrennie Apr 02 '23 at 22:12
  • Hey @nrennie, thanks - i edited the code. sorry about that. – edodar Apr 03 '23 at 03:43

0 Answers0