0

I am using stat qq plots to indicate normality of any given variable. I wish to colour each plot with a second variable so that I can see the distribution of each factor level in this second variable in terms of the normality distribution of the first variable. I can achieve this when I produce plots one by one. However, I would like to use purrr::map to iterate through a list of secondary variables. I have four secondary variables, and so would prefer not to have to duplicate the code several times

I have spent considerable time looking online and also reading through the questions and answers on here.

I use the following code. Following the sample data to create a reproducible example, I first create a gglayers object, which contains ggplot meta data, so as to avoid duplicating it in every plot

RSKPH2 <- data.frame(
release_speed_kph =c(87.42, 141.37, 133.41, 89.12, 96.64, 141.39, 137.16, 98.09, 144.22, 101.19),
batting_hand = c("left", "right", "right", "right", "right", "left", "right", "right", "left", "right"),
bowling_hand = c("right", "right", "left", "right", "left", "right", "right", "right", "left", "right"),
bowling_type = c("spin", "pace", "pace", "spin", "spin", "pace", "pace", "spin", "pace", "pace"),
wicket = c("Wicket", "No Wicket", "No Wicket", "Wicket", "No Wicket", "No Wicket", "Wicket", "No Wicket", "Wicket", "Wicket")
)
gglayers <- list(stat_qq(),
                 stat_qq_line(),
                 theme_classic2(),
                 theme(plot.title = element_text(hjust = 0.5)),
                 xlab("Theoretical Quantiles"),
                 ylab("Sample Quantiles"))

The following object informs R which title to use for each plot:

RSKPH2_Titles <- c("Release Speed Kph with Batting Hand QQ Plot",
                  "Release Speed Kph with Bowling Hand QQ Plot",
                  "Release Speed Kph with Bowling Type QQ Plot",
                  "Release Speed Kph with Wicket QQ Plot")

Then, the following object creates a vector of the names of the secondary variables I wish to use - to allow purrr::map to iterate through:

RSKPH2_names <- c("batting_hand", "bowling_hand", "bowling_type", "wicket")

Now that the required objects have been created, I use the following code in an attempt to enable purrr::map to iterate through:

TP2 <- RSKPH2 %>% 
  map( ~ {ggplot(RSKPH2, 
                 aes(sample = release_speed_kph)) + geom_line(aes(color = RSKPH2_names)) + labs(title = RSKPH2_Titles) + gglayers}
  )
TP2

I get the following error:

Error in geom_line(): ! Problem while computing aesthetics. ℹ Error occurred in the 1st layer. Caused by error in check_aesthetics(): ! Aesthetics must be either length 1 or the same as the data (194368) ✖ Fix the following mappings: colour

I use the following code to successfully create one plot:

RSKPH_wicket <- ggplot(RSKPH2, 
aes(sample = release_speed_kph, colour = wicket)) +
  ggtitle("Release Speed Kph with Wicket QQ Plot") + gglayers
RSKPH_wicket

EDIT

When I use the code from the answer on the larger data, the below plot results. I have taken the second of the four plots as an example. An additional item has been added to the legend, which is not found in the data. How can I remove this?

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51
  • You are mapping `RSKPH2_names` on the color aes which according to your code is a length 4 vector. Not sure what you are trying to achieve with that. Also your `map` doesn't make that much sense as your are looping over the columns of your dataset. Perhaps you want `RSKPH2_names %>% map(...)` and `color = .data[[.x]]` to map the "secondary" variable on the color aes. For more help I would suggest to provide [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a snippet of your data or some fake data. – stefan Feb 14 '23 at 09:31
  • Hi. I added a reproducible example – Nicholas Bradley Feb 14 '23 at 09:59
  • I think it would be very helpful to first create *one* plot outside of a loop, define what you want to show and then think about how we can create many similar plots in a loop. `geom_line()` needs `x` and `y` arguments. Can you show us one working plot as desired output? – TimTeaFan Feb 14 '23 at 10:01
  • Thanks. I have added the code I use to create a single plot – Nicholas Bradley Feb 14 '23 at 10:10

1 Answers1

0

As far as I understand your question you want to create separate plots for each of your RSKPH2_names columns. If that is correct then you have to loop over RSKPH2_names. Additionally, as you probably want a different title for each plot according to RSKPH2_Titles I would use map2. Finally, I'm not sure how you want to add a geom_line to your qq plot. Perhaps you want a qq plot per categories overlayed on plot for the whole sample? Assuming that this is what you want, I used stat_qq and stat_qq_line in the ggplot2 code:

library(ggplot2)
library(purrr)

TP2 <- map2(RSKPH2_names, RSKPH2_Titles, ~
    ggplot(
      RSKPH2,
      aes(sample = release_speed_kph)
    ) +
      stat_qq_line(aes(color = .data[[.x]])) +
      stat_qq(aes(color = .data[[.x]])) +
      #geom_line(aes(color = .data[[.x]])) +
      labs(title = .y) +
      gglayers
  )
TP2[[1]]

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Hi Nicholas. First. I added your edit to my answer as an edit to your question. Second, if you think your data does not contain the "additional" item, I'm pretty sure it does. `ggplot2` will not add any categories to your data. I would suggest to check your data once more, e.g. using `unique(RSKPH2$bowling_hand)`. – stefan Feb 14 '23 at 11:28
  • I am struggling to correctly use ggsave to save the four plots. I have looked online and on stackoverflow, but havent had any success. I use the following code: ggplot2::ggsave(filename = "myplot.pdf", plot = map(TP2, print), device = "pdf") – Nicholas Bradley Feb 14 '23 at 14:16
  • If you want to save all four plots in one pdf try `pdf("myplot.pdf"); purrr::walk(TP2, print); dev.off()`. – stefan Feb 14 '23 at 14:27
  • How would I save each plot as a separate pdf or separate png? Thanks – Nicholas Bradley Feb 14 '23 at 16:20
  • (: In that case we could use `ggsave`. Try `purrr::iwalk(TP2, ~ ggplot2::ggsave(paste0("myplot", .y, ".pdf"), plot = .x))`. Here I use `iwalk` for convenience to a number to the filename. – stefan Feb 14 '23 at 16:50