0

I have a dataset of measurements from different genera. Some of these genera have multiple datapoints, others have only one. I've written the following code to produce a scatterplot of my measurements table containing Measurement_A and Measurement_B.

# Load ggplot2 and ggpmisc
library(ggplot2)
library(ggpmisc)
measurements <- read.csv("measurements.csv")
# Define colours and shapes for each genus (which I've called A, B, C and D)
colour_specifications <- c("A"="green", "B"="red", "C"="yellow",  "D"="blue")
shape_specifications <- c("A"=15, "B"=16, "C"=17, "D"=18)

# Create scatterplot of data from each genus
# and plot a regression line for each genus and show the R^2 value for each line
scatterplot <- ggplot(measurements, aes(x=Measurement_A, y=Measurement_B,     
colour=Genus, shape=Genus))+
+   geom_point()+theme_classic()+
+   scale_color_manual(values = colour_specifications)+
+   scale_shape_manual(values= shape_specifications)+
+   geom_smooth(method=lm, se=FALSE)+
+   stat_poly_eq(formula = y ~ x, eq.with.lhs=FALSE, parse = TRUE)
> scatterplot

Despite genera C and D only having one datapoint the legends show the regression line as well as the symbol. I would like to find a way to plot regression lines for just those subset of genera that have multiple datapoints (genera A and B). I have tried the subset function as follows,

scatterplot <- ggplot(measurements, aes(x=Measurement_A, y=Measurement_B,
    colour=Genus, shape=Genus))+
  geom_point()+theme_classic()+
  scale_color_manual(values = colour_specifications)+
  scale_shape_manual(values= shape_specifications)+
  geom_smooth(data=subset(measurements, Genus="A"),
          method='lm', se=FALSE)+
  geom_smooth(method=lm, se=FALSE)+
  stat_poly_eq(formula = y ~ x, eq.with.lhs=FALSE, parse = TRUE)

but it hasn't made any difference to the legend which makes me think I've done something wrong.

Is there a way to plot the regression lines for just genera A and B and not C and D?

ETA:

This is the data I'm using for this:

Genus <- c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "C",     "D") 
Specimen <- c("Aa","Ab","Ac","Ad","Ae","Af","Ag","Ah","Ba", "Bb", "Bc", "Bd", "Ca", "Da") 
Measurement_A <- c(60, 80, 100, 105, 120, 130, 140, 95, 70, 80, 90, 100, 170, 55) 
Measurement_B <- c(10, 15, 30, 30, 35, 40, 40, 27, 10, 17, 20, 27, 15, 5)             
measurements <- data.frame(Genus, Specimen, Measurement_A, Measurement_B)

All I want to do is create a single plot of my data with a single legend but with genera A and B showing regression lines and the R^2 values. Both sets of code above give me a legend with regression lines through all the symbols, not just A and B. As I explained above, I have tried the subset function but it does not change the legend. When I tried

scatterplot <- ggplot(measurements, aes(x=Measurement_A, y=Measurement_B,     
                                        colour=Genus, shape=Genus))+
geom_point()+theme_classic()+
scale_color_manual(values = colour_specifications)+
scale_shape_manual(values= shape_specifications)+
geom_smooth(method=lm, se=FALSE)+
scale_color_manual(values = c("green","red"),limits = c("A","B"))+
stat_poly_eq(formula = y ~ x, eq.with.lhs=FALSE, parse = TRUE)
scatterplot

I get an error message saying "Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale" and then I get two legends, one with the preset symbols but all in black for all genera, and one with the colours but just circular symbols for genera A and B.

Plot of data

ETA: I have found the answer, I just needed to add show.legend=FALSE in the geom_smooth bracket.

1 Answers1

0

Why don't you just specify both of your factor levels via subset?

scatterplot <- ggplot(measurements, aes(x=Measurement_A, y=Measurement_B,
colour=Genus, shape=Genus)) 
+ geom_point() 
+ geom_smooth(data=subset(measurements, Genus== "A" | Genus == "B"), method='lm', se=FALSE)
tifu
  • 1,352
  • 6
  • 17
  • The legend still shows regression lines on symbols for genera that haven't been regressed. – Sarah Hearne Dec 08 '17 at 13:26
  • You can add `+ scale_color_manual(values = c("green","red"),limits = c("A","B"))` at the end of the `ggplot()` call to remove from the legend the categories that have not been regressed – Antonio Dec 08 '17 at 20:50
  • That's created a second legend so I have one legend with the symbols I've preset but all in black, and then given me a second legend for just A and B with the colours but no the symbols. I want a single legend showing the symbols for all my genera (A, B, C and D). I want to add regression lines for just the genera with multiple datapoints (A and B) and I want the legend to reflect this by only showing the regression line through the symbols of A and B, leaving the legend for C and D with just the symbols. Is this possible and if so, how do I do it? – Sarah Hearne Dec 10 '17 at 01:41
  • It'd be easier to find out what's wrong with the code if you provided a reproducible example. – tifu Dec 11 '17 at 07:14
  • How do I do that? – Sarah Hearne Dec 11 '17 at 08:05
  • Have a ´look at this: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – tifu Dec 11 '17 at 08:16
  • This probably isn't the most elegant way of doing this but hopefully this works (and I can't get it to format to give each line of code a new line in the comment), `Genus <- c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "C", "D")` `Specimen <- c("Aa","Ab","Ac","Ad","Ae","Af","Ag","Ah","Ba", "Bb", "Bc", "Bd", "Ca", "Da")` `Measurement_A <- c(60, 80, 100, 105, 120, 130, 140, 95, 70, 80, 90, 100, 170, 55)` `Measurement_B <- c(10, 15, 30, 30, 35, 40, 40, 27, 10, 17, 20, 27, 15, 5)` `measurements <- data.frame(Genus, Specimen, Measurement_A, Measurement_B)` – Sarah Hearne Dec 11 '17 at 08:34
  • You could just edit your question to include the formatting and easier future reference :) Also, include the line of ggplot code that you want fixed. – tifu Dec 11 '17 at 08:45
  • Ah, thanks, I didn't realise the question was editable. I've included all the code I've used in my question. What I want is to get a legend which doesn't show a regression line through the symbols for those genera that haven't been regressed. I've given the code I've tried (the `subset` section) but it didn't change anything. I have searched and searched but can't find a way to have some genera regressed and others not, and to maintain a single legend. The suggestions so far have either not changed the legend or given me two legends. I can do this in Excel but can't work out how to do it in R. – Sarah Hearne Dec 11 '17 at 08:52
  • I don't know what I've done differently but the subset code is now working. Thank you!! Though it's still showing the R^2 value, which is annoying. At least it's removed the regression line. – Sarah Hearne Jan 17 '18 at 04:21