I have a dataset containing answers to a survey (q1:q4), alongside characteristics of respondents (Project, Level).
data <- data.frame(Project = c(paste0("P", sample(1:3, 10, replace = TRUE))),
Level = c(sample(1:3, 10, replace = TRUE)),
q1 = c(sample(1:10, 10, replace = TRUE)),
q2 = c(sample(1:10, 10, replace = TRUE)),
q3 = c(sample(1:10, 10, replace = TRUE)),
q4 = c(sample(1:10, 10, replace = TRUE))
)
I would like to create nice-looking scatterplots using ggscatterplot showing the correlation between q1 and the other three questions grouping respondents by level and by project.
I have developed this function:
var_look2 <- function(data) {
var_names <- data %>% select(q1:q4) %>% colnames()
levels <- c(1:3)
projects <- unique(data$Project)
df_cor <- data %>% mutate_if(is.character, as.factor)
df_cor <- df_cor %>% mutate_if(is.factor, as.numeric)
for(var in var_names) {
for (level in levels) {
data_subset <- subset(df_cor, Level == 1)
for(project in projects) {
data_subset <- subset(df_cor, Project == project)
n <- nrow(data_subset)
p<- ggscatterstats(
data = data_subset,
type = "non-parametric",
x = {{var}},
y = q1,
bf.message = FALSE,
title = paste(paste(project, "scatterplot level", level, "N =", n)),
marginal = TRUE
)
ggsave(filename = paste0(project, " ", var, " ", level, " ", " .jpeg"), plot = p,
width = 1000, height = 1000, units = "px", scale = 1)
}
}
}
}
Problem 1:
When run var_look2(data)
I get the following output:
> var_look2(data)
# Error:
# ! Problem while setting up layer.
# ℹ Error occurred in the 3rd layer.
# Caused by error in `$<-.data.frame`:
# ! replacement has 1 row, data has 0
# Run `rlang::last_trace()` to see where the error occurred.
After turning on and off all the loops, I figured that the problem is with this line:
data_subset <- subset(df_cor, Project == project)
as this line generates an empty data_subset
. Any ideas?
Problem 2:
If I remove the line data_subset <- subset(df_cor, Project == project)
ggsave does what I expect.
However, what I actually want is to be able to plot these scatterplots grouped by level and/or project to allow readers to do immediate comparisons.
In order to do this, instead of having the ggsave command at the end, I would like to create a list containing all the plots named appropriately so that I can eventually feed to ggplot. i tried with this command
p<- ggscatterstats(
data = data_subset,
type = "non-parametric",
x = {{var}},
y = q1,
bf.message = FALSE,
title = paste((project, " ", {{var}}, " scatterplot level ", level, "N =", n)),
marginal = TRUE)
plot_list[[paste0(project, " ", var, " ", level)]] <- p
However, if i run the command:
plot_list <- var_look(GES_BGD1)
,
what I get is that plot_list is a NULL
object.
I was expecting that plot_list would contain all the scatterplots as described above. This is weird to me, because the ggsave command does save the scatterplot, so the ggscatterplot
command is not the issue.