How to set the stat_function in for loop to plot two graphs with normal distribution, central and variance parameters

Question

I would like to create the following plots in parallel

I have used the following code using the wide format dataset:

sumstatz_1 <- data.frame(whichstat = c("mean",
                                     "sd upr", 
                                     "sd lwr", 
                                     "median"),
                       value     = c(mean(data$score),
                                     mean(data$score)+sd(data$score),
                                     mean(data$score)-sd(data$score), 
                                     median(data$score)))


plot2 = ggplot(data, aes(x = score)) +                           
  geom_histogram(aes(y =..density..),
                 breaks = seq(0, max(data$score), by = 5), 
                 colour = "black", 
                 fill = "white") + stat_function(fun = dnorm, 
                                   args = list(mean = mean(data$score, na.rm = TRUE), 
                                   sd = sd(data$score, na.rm = TRUE)), 
                                   colour = 'black', size = 1) + 
  labs(title='score', x='score', y= 'Distribution') +
  geom_vline(data=sumstatz_1,aes(xintercept = value,
                               linetype = whichstat,
                               col = whichstat),size=1)

I have taken it by changing just the variable of interest to create the second graph. Anyway, I would like to create the same result by using an interactive graph. Here I have set up the following code that I have converted into a long format for convenience and then I have coded the following for loop:

for (i in 101:ncol(long)) {
    p <- ggplot(long, aes(x = points)) +                           
      geom_histogram(aes(y =..density..), 
                     breaks = seq(0, 50, by = 3), 
                     colour = "black", 
                     fill = "white") + facet_grid(.~ score)
} for (j in seq_along(long$score)){
   p +
      stat_function(fun = dnorm[???], 
                    args = list(mean = mean(long$points[long$score == 'j'], na.rm = TRUE), 
                                sd = mean(long$points[long$score == 'j'], na.rm = TRUE)), 
                    colour = 'black', size = 1)
  }

print(p)

But I have no clue how to set parameters in stat_function() nor wether it is possible to use in a for loop or another iterative method. Would you have possibly any suggestion?

Here the dataset

structure(list(ID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 
7, 8, 8, 9, 9, 10, 10), score = structure(list(MM_score = c("score_2", 
"score_1", "score_2", "score_1", "score_2", "score_1", "score_2", 
"score_1", "score_2", "score_1", "score_2", "score_1", "score_2", 
"score_1", "score_2", "score_1", "score_2", "score_1", "score_2", 
"score_1")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", 
"data.frame")), points = c(53, 13.25, 17.5, 1.59090909090909, 
48.5, 6.92857142857143, 40, 3.63636363636364, 46, 7.07692307692308, 
38, 4.47058823529412, 14.5, 1.61111111111111, 19.5, 3.54545454545455, 
37.5, 3.40909090909091, 5.5, 0.916666666666667)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
    ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), .rows = structure(list(
        1:2, 3:4, 5:6, 7:8, 9:10, 11:12, 13:14, 15:16, 17:18, 
        19:20), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), .drop = TRUE))

Hi! I didn't get what you want to do with the normal plots... do you want to plot many normals in the same plot then print it with all? Or do you want to plot only the normal that best fits with your data? Also, what's 'long' object? — Lucas, Feb 22 '23 at 13:23
Hello. At this very moment, I could not edit my post. The reason you are visualizing two of them is because I readapted the code you can see in the post - meaning that I have copied the code again as it is and replacing interested value - using the 'score_2' as a variable whose I would like to plot the distribution. My idea would be to use a loop to produce this grap and thus a way through which to adapt the stat_function for this purpose. — 12666727b9, Feb 22 '23 at 13:33
Here it's well explained how to do it https://stackoverflow.com/questions/1376967/using-stat-function-and-facet-wrap-together-in-ggplot2-in-r — Lucas, Feb 22 '23 at 15:58
I have already tried something like this on my own, but it didn't work. This is why I am wondering of there is another way through — 12666727b9, Feb 23 '23 at 13:22

Lucas · Accepted Answer · 2023-02-24T19:59:52.783

Try using this code:


dados <- structure(list(ID = c(1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 
7, 8, 8, 9, 9, 10, 10), score = structure(list(MM_score = c("score_2", 
"score_1", "score_2", "score_1", "score_2", "score_1", "score_2", 
"score_1", "score_2", "score_1", "score_2", "score_1", "score_2", 
"score_1", "score_2", "score_1", "score_2", "score_1", "score_2", 
"score_1")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", 
"data.frame")), points = c(53, 13.25, 17.5, 1.59090909090909, 
48.5, 6.92857142857143, 40, 3.63636363636364, 46, 7.07692307692308, 
38, 4.47058823529412, 14.5, 1.61111111111111, 19.5, 3.54545454545455, 
37.5, 3.40909090909091, 5.5, 0.916666666666667)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -20L), groups = structure(list(
    ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), .rows = structure(list(
        1:2, 3:4, 5:6, 7:8, 9:10, 11:12, 13:14, 15:16, 17:18, 
        19:20), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -10L), .drop = TRUE))

dados <- dados %>% ungroup() %>% mutate(score = factor(score$MM_score))


grid <- with(dados, seq(min(points), max(points), length = 100))
normaldens <- data.frame()
sumstatz_1 <- data.frame()
for(i in levels(dados$score)){
  aux <- dados %>% filter(score == i) %>% 
    summarise(mean = mean(points), sd = sd(points), median = median(points))
  normaldens <- rbind(normaldens,data.frame(score = rep(i,100),
                                            points = grid,
                                            density = dnorm(grid, aux$mean, aux$sd)))
  sumstatz_1 <- rbind(sumstatz_1,
                      data.frame(score = rep(i,4),
                                 whichstat = c("mean",
                                               "sd upr", 
                                               "sd lwr", 
                                               "median"),
                                 value = c(aux$mean,
                                     aux$mean+aux$sd,
                                     aux$mean-aux$sd, 
                                     aux$median)))
}

ggplot(dados, aes(x = points))  + 
  geom_histogram(aes(y = ..density..), 
                     breaks = seq(0, 50, by = 3), 
                     colour = "black", 
                     fill = "white") + 
  geom_line(aes(y = density), data = normaldens, colour = "red") +
  geom_vline(data=sumstatz_1,aes(xintercept = value,
                               linetype = whichstat,
                               col = whichstat),size=1)+
  facet_wrap(~score)

If you have any questions, please ask me!!

Thank you so much. It is always a pleasure to learn something new. If you want you can a look to the solution I come up with. By the way, I would this answer: could you olease explain in few words what the function with() and grip() do? Thanks — 12666727b9, Mar 11 '23 at 18:11

score 0 · Answer 2 · answered Mar 11 '23 at 18:12

This is the solution I come up with on my own, if it might be useful for somebody else:

sumstatz = NULL
df = NULL   
name = NULL
dim = c(5, 1)
par(mfrow = c(1, 2))
plot = NULL
for (j in 1:length(dim)){
  name[j] = unique(long$MM_score)[j]
  df[[name[j]]]= long[long$MM_score == unique(long$MM_score)[j] & long$points,]
  sumstatz[[name[j]]] <- data.frame(whichstat = c("median","qupr","qlwr"), 
                                     value = c(mean(df[[j]]$points),
                                               mean(df[[j]]$points) + sd(df[[j]]$points),
                                               mean(df[[j]]$points)-sd(df[[j]]$points)))
  plot[[name[j]]] = ggplot(df[[j]], aes(x = points)) +                           
          geom_histogram(aes(y =..density..),breaks = seq(min(df[[j]]$points), 
                         max(df[[j]]$points), by = dim[j]), colour = "black", 
                         fill = "white") + 
          stat_function(fun = dnorm,
                        args = list(mean = mean(df[[j]]$points, na.rm = TRUE),
                                    sd = sd(df[[j]]$points, na.rm = TRUE)), 
                                    colour = 'black', size = 1) + 
          labs(title= unique(long$MM_score)[j], x= unique(long$MM_score)[j], 
               y= 'Distribution') + geom_vline(data=sumstatz[[j]],
                                               aes(xintercept = value,
                                                   linetype = whichstat,
                                                   col = whichstat),size=1)
}

ggarrange(plot[[1]], plot[[2]])

If someone see some redundancies or would like to add some alternatives. That would be a pleasure to learn something else — 12666727b9, Mar 11 '23 at 18:16

How to set the stat_function in for loop to plot two graphs with normal distribution, central and variance parameters

2 Answers2