-2

I am plotting residuals from two different methods with the following code:

ggplot(df_index, aes(cases100ppl, lm_errors)) +
  geom_point(alpha=1) +
  geom_point(data=df_index, aes(cases100ppl, error), col="red", alpha=0.2) 

How can I add a legend to this?

the data has a structure like this:

code       cases100ppl   error         lm_errors
E02000001  0.05575558    0.2228769     0.1554760                
E02000002  0.11299289    0.3680860     0.4357544            
E02000003  0.11938429    0.4785204     0.3163543            
E02000004  0.10767160    0.1978992     0.3909933            
E02000005  0.11138804    0.3544542     0.3370886            
E02000007  0.09484474    0.3447380     0.3881657

Output looks something like this: enter image description here Thanks!

mankojag
  • 61
  • 5
  • 1
    Can you please `dput()` you data? The question is reproducible. – Serkan Jul 22 '21 at 11:14
  • Sorry, what do you mean `dput()`? – mankojag Jul 22 '21 at 11:48
  • 1
    Go ahead and try this code in your script ‘dput(data)’ and see what happens in the console. It produces a snapshot of your data that we can use to solve your problem – Serkan Jul 22 '21 at 11:59
  • It produces a very long string of values from my dataset, probably too long to paste it here. – mankojag Jul 22 '21 at 12:10
  • Then try dput(head(data)) and paste that in your post! – Serkan Jul 22 '21 at 12:20
  • 2
    @mankojag what Serkan is pointing out is that the example in your question was not reproducible. For future questions you might have a look on a [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example. I.e, to produce a minimal data set, you can use `head()`, `subset()`. Then use `dput()` to give us something that can be put in R immediately. Alternatively, you can use base R datasets (to see complete list `library(help = "datasets")`). – Paul Jul 22 '21 at 12:49

1 Answers1

1

You will need to change a little your data so you can use aes() to set the color and the alpha. This is a very usefull trick with ggplot (you can find ways to do it, including the one presented here, on SO posts like this one). You can find more general informations about pivoting here in the book R for data science, chapter 12 Tidy data.

Accordingly, I pivot your dataframe to make a new variable called error_type. This new variable is then used inside aes() so the legend is created accordingly. Note that, using, dplyr pipe symbol %>% I pivot your dataframe just before entering ggplot world, without changing the original df_index object.

Then you can usescale_alpha_manual() and scale_colour_manual() to custom the color and the alpha the way you want it to be.

Here is a start:

library(dplyr)
library(tidyr)
library(ggplot2)

df_index %>% 
  pivot_longer(cols = c("error", "lm_errors"), names_to = "error_type", values_to = "error_value") %>% 
  ggplot(data = ., aes(x = cases100ppl, 
                       y = error_value, 
                       color = error_type, 
                       alpha = error_type)) + # do not forget to put alpha inside aes()!
  scale_alpha_manual(values = c("error" = 0.3, "lm_errors" = 1)) +
  geom_point()

ex_plot

Data:

df_index <- structure(list(code = c("E02000001", "E02000002", "E02000003", 
                                    "E02000004", "E02000005", "E02000007"), cases100ppl = c(0.05575558, 
                                                                                            0.11299289, 0.11938429, 0.1076716, 0.11138804, 0.09484474), error = c(0.2228769, 
                                                                                                                                                                  0.368086, 0.4785204, 0.1978992, 0.3544542, 0.344738), lm_errors = c(0.155476, 
                                                                                                                                                                                                                                      0.4357544, 0.3163543, 0.3909933, 0.3370886, 0.3881657)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                   -6L))
Paul
  • 2,850
  • 1
  • 12
  • 37
  • Ok, that's looking good! How can I make it so that one error_type has a an alpha of some value, and the other has a different value? – mankojag Jul 22 '21 at 15:09
  • @mankojag You can use `scale_alpha_manual()` and parameter the `value` argument with a named vector i.e. `scale_alpha_manual(values = c("error" = 0.6, "lm_error" = 1)` – Paul Jul 23 '21 at 05:13
  • `scale_alpha_manual(values = c(0.3, 1), breaks = c("error", "lm_errors"))` works also. – Paul Jul 23 '21 at 05:25
  • THis doesn't seem to be working: ```df_index %>% pivot_longer(cols = c("lm_errors", "error"), names_to = "error_type", values_to = "error_value") %>% arrange(desc(error_type)) %>% ggplot(data = ., aes(x = cases100ppl, y = error_value, color = error_type)) + scale_alpha_manual(values = c("error" = 0.2, "lm_error" = 1)) + geom_point()``` – mankojag Jul 23 '21 at 09:04
  • 1
    @mankojag please see the updated answer, I do not get an error. Also, you NEED to put `alpha` **inside** `aes()` – Paul Jul 23 '21 at 09:15
  • Okay, I get it now, thanks for your help! – mankojag Jul 23 '21 at 09:38