1

I´m running a study where I want wo display the results using facet_grid from ggplot2.

My testdata can be found here text

The Data has six columns, Ausfallrate (MissingRate), PFC_MCA, PFC_Hot, PFC_Mode, AnzahlI (Number of I), AnzahlJ (Number of J) I need to plot (scatterplot with connected lines) the variables PFC_MCA, PFC_Hot, PFC_Mode as y-values over the x-Values of the MissingRate. AnzahlI and AnzahlJ each have 3 levels, so 9 possible combinations and are used as facets. I have to plot the scatterplot for each of the 9 combinations using facet_grid.

This is basically the outcome I´m looking for. (The actual data in my testdate is identical for all 9 combinations, so the plots are identical) enter image description here

Problem with this is that I can´t get ggplot2 to print a legend. This plot was created using this

p<-2.5    #PointSize
lineweight<-0.8  #Lineweight
ggplot(test_outputdata,
        aes(x=Ausfallrate))+
        
        geom_point(aes(y=PFC_MCA),color="red",pch=15,cex=p)+
        geom_line(aes(y=PFC_MCA), color="red",linetype="solid",lwd=lineweight) + 
        
        geom_point(aes(y=PFC_Hot),color="blue", pch=16,cex=p)+
        geom_line(aes(y=PFC_Hot), color="blue",linetype="dashed",lwd=lineweight) +
        
        geom_point(aes(y=PFC_Mode),color="black",pch=17,cex=p)+
        geom_line(aes(y=PFC_Mode), color="black",linetype="dotdash",lwd=lineweight) +
        
        facet_grid(AnzahlJ~AnzahlI,
                     #Umbenennen bzw richtig schreiben der Labels
                     labeller = labeller(
                       AnzahlJ = c(`3` = "J=3", `6` = "J=6", `10` = "J=10"),
                       AnzahlI= c(`100` = "I=100", `500` = "I=500", `1000` = "I=1000"),
                       
                    )
                  )+
        labs(y= "PFC")+
        ggtitle("MCAR")

I figured out that it has something to do with the aes()-function. Lets ignore the shape, labels and lineweigts for a second and focus on color. If I put the color statement within the aes()-call like this:

ggplot(test_outputdata,
        aes(x=Ausfallrate))+
        
        geom_point(aes(y=PFC_MCA,color="red"))+
        geom_line(aes(y=PFC_MCA,color="red")) + 
        
        geom_point(aes(y=PFC_Hot,color="blue"))+
        geom_line(aes(y=PFC_Hot),color="blue") +
        
        geom_point(aes(y=PFC_Mode,color="black"))+
        geom_line(aes(y=PFC_Mode,color="black")) +
        
        facet_grid(AnzahlJ~AnzahlI)

The result now looks like this enter image description here

Nice, I get a legend. But for whatever reason the colors are wrong, the supposedly black line is red etc. I also can´t figure out how to adjust the point shapes or the lineweights this way. Here I found that I should be able to use e.g. shape=18 inside geom_point() to change the points. This sorta works (doing it outside aes()). Thing is, it changes the symbol in the plot as expected but in the legend it changes all points.

All I want to achieve is a working legend and beeing able to specify the colors, point symbols, size of symbols and lines and linetypes. I´ve also tried tfind something here but nothing really made sense to me.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Kevin
  • 47
  • 6

1 Answers1

2

There are quite a few 'requested changes' in your question, and I've done my best to address them, but if you have further tweaks you'd like to make (and you can't figure it out yourself) please feel free to leave a comment below and I'll take a look.

The approach I've used is based on functions from the tidyverse library:

Load the tidyverse package and example data:

library(tidyverse)

df <- read.table(text = "Ausfallrate    PFC_MCA PFC_Hot PFC_Mode    AnzahlI AnzahlJ
0,1 0,2 0,1 0,2 100 3
0,2 0,25    0,15    0,3 100 3
0,3 0,3 0,2 0,4 100 3
0,4 0,35    0,25    0,5 100 3
0,5 0,4 0,3 0,6 100 3
0,1 0,2 0,1 0,2 100 6
0,2 0,25    0,15    0,3 100 6
0,3 0,3 0,2 0,4 100 6
0,4 0,35    0,25    0,5 100 6
0,5 0,4 0,3 0,6 100 6
0,1 0,2 0,1 0,2 100 10
0,2 0,25    0,15    0,3 100 10
0,3 0,3 0,2 0,4 100 10
0,4 0,35    0,25    0,5 100 10
0,5 0,4 0,3 0,6 100 10
0,1 0,2 0,1 0,2 500 3
0,2 0,25    0,15    0,3 500 3
0,3 0,3 0,2 0,4 500 3
0,4 0,35    0,25    0,5 500 3
0,5 0,4 0,3 0,6 500 3
0,1 0,2 0,1 0,2 500 6
0,2 0,25    0,15    0,3 500 6
0,3 0,3 0,2 0,4 500 6
0,4 0,35    0,25    0,5 500 6
0,5 0,4 0,3 0,6 500 6
0,1 0,2 0,1 0,2 500 10
0,2 0,25    0,15    0,3 500 10
0,3 0,3 0,2 0,4 500 10
0,4 0,35    0,25    0,5 500 10
0,5 0,4 0,3 0,6 500 10
0,1 0,2 0,1 0,2 1000    3
0,2 0,25    0,15    0,3 1000    3
0,3 0,3 0,2 0,4 1000    3
0,4 0,35    0,25    0,5 1000    3
0,5 0,4 0,3 0,6 1000    3
0,1 0,2 0,1 0,2 1000    6
0,2 0,25    0,15    0,3 1000    6
0,3 0,3 0,2 0,4 1000    6
0,4 0,35    0,25    0,5 1000    6
0,5 0,4 0,3 0,6 1000    6
0,1 0,2 0,1 0,2 1000    10
0,2 0,25    0,15    0,3 1000    10
0,3 0,3 0,2 0,4 1000    10
0,4 0,35    0,25    0,5 1000    10
0,5 0,4 0,3 0,6 1000    10", header = TRUE)

Create the plot:

df %>%
  mutate(across(where(is.character),
                ~parse_number(.x, locale = locale(decimal_mark = ",")))) %>%
  pivot_longer(-c(Ausfallrate, AnzahlJ, AnzahlI),
               names_to = "Type",
               values_to = "PFC") %>%
  mutate(Type = factor(Type, levels = c("PFC_Hot", "PFC_Mode", "PFC_MCA"))) %>%
  ggplot(aes(x = Ausfallrate, y = PFC,
             group = Type, color = Type,
             shape = Type)) +
  geom_point() +
  geom_line() +
  facet_grid(rows = vars(AnzahlJ), cols = vars(AnzahlI),
             labeller = labeller(
               AnzahlJ = c(`3` = "J=3", `6` = "J=6", `10` = "J=10"),
               AnzahlI= c(`100` = "I=100", `500` = "I=500", `1000` = "I=1000")
  )) +
  scale_color_manual(values = c("blue", "black", "red"),
                     labels = c("PFC_Hot", "PFC_Mode", "PFC_MCA")) +
  scale_shape_manual(values = c(1, 15, 18),
                     labels = c("PFC_Hot", "PFC_Mode", "PFC_MCA"))

Created on 2023-05-19 with reprex v2.0.2

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
  • Thanks alot, that worked. I was able add another variable on my own and change the size and linewidth by inserting size= and linewidth= into geom_line() and geom_point respectivly. I´ll be honest, I have no idea what you did there and what the problem with my approach was, could you elaborate a bit please? – Kevin May 19 '23 at 05:59
  • 1
    Sure thing; first I converted the data into a numeric format ("0,4" to "0.4") using `mutate()` from the dplyr library and `parse_number()` from the readr library (both part of the tidyverse package). The biggest change was to convert your data to 'long' format before plotting it. There's a great explanation on what that means [here](https://tidyr.tidyverse.org/articles/pivot.html). Once you have the data in long format it is a lot easier to analyse / plot. I also changed some of the variables to factors (more details on that [here](https://r4ds.had.co.nz/factors.html)). Does that help? – jared_mamrot May 19 '23 at 06:18
  • I´ll have to look into it but that´s certainly a starting point. Thanks alot – Kevin May 19 '23 at 06:47
  • Could you pls check my post, question was to long for a comment. Thanks – Kevin May 21 '23 at 16:14
  • I´ve been advised to create a new queste. Pls look [here](https://stackoverflow.com/questions/76301586/r-ggplot2-facet-grid-extra-unintended-na-line-in-plot) – Kevin May 21 '23 at 19:29
  • Hi @Kevin, I had a look at your new question; it looks like you've got two 'high quality' answers. Unless there's something else you wanted to ask, good luck with your research :) – jared_mamrot May 22 '23 at 00:31