I have a dataset of measurements from different genera. Some of these genera have multiple datapoints, others have only one. I've written the following code to produce a scatterplot of my measurements
table containing Measurement_A
and Measurement_B
.
# Load ggplot2 and ggpmisc
library(ggplot2)
library(ggpmisc)
measurements <- read.csv("measurements.csv")
# Define colours and shapes for each genus (which I've called A, B, C and D)
colour_specifications <- c("A"="green", "B"="red", "C"="yellow", "D"="blue")
shape_specifications <- c("A"=15, "B"=16, "C"=17, "D"=18)
# Create scatterplot of data from each genus
# and plot a regression line for each genus and show the R^2 value for each line
scatterplot <- ggplot(measurements, aes(x=Measurement_A, y=Measurement_B,
colour=Genus, shape=Genus))+
+ geom_point()+theme_classic()+
+ scale_color_manual(values = colour_specifications)+
+ scale_shape_manual(values= shape_specifications)+
+ geom_smooth(method=lm, se=FALSE)+
+ stat_poly_eq(formula = y ~ x, eq.with.lhs=FALSE, parse = TRUE)
> scatterplot
Despite genera C and D only having one datapoint the legends show the regression line as well as the symbol. I would like to find a way to plot regression lines for just those subset of genera that have multiple datapoints (genera A and B). I have tried the subset
function as follows,
scatterplot <- ggplot(measurements, aes(x=Measurement_A, y=Measurement_B,
colour=Genus, shape=Genus))+
geom_point()+theme_classic()+
scale_color_manual(values = colour_specifications)+
scale_shape_manual(values= shape_specifications)+
geom_smooth(data=subset(measurements, Genus="A"),
method='lm', se=FALSE)+
geom_smooth(method=lm, se=FALSE)+
stat_poly_eq(formula = y ~ x, eq.with.lhs=FALSE, parse = TRUE)
but it hasn't made any difference to the legend which makes me think I've done something wrong.
Is there a way to plot the regression lines for just genera A and B and not C and D?
ETA:
This is the data I'm using for this:
Genus <- c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "C", "D")
Specimen <- c("Aa","Ab","Ac","Ad","Ae","Af","Ag","Ah","Ba", "Bb", "Bc", "Bd", "Ca", "Da")
Measurement_A <- c(60, 80, 100, 105, 120, 130, 140, 95, 70, 80, 90, 100, 170, 55)
Measurement_B <- c(10, 15, 30, 30, 35, 40, 40, 27, 10, 17, 20, 27, 15, 5)
measurements <- data.frame(Genus, Specimen, Measurement_A, Measurement_B)
All I want to do is create a single plot of my data with a single legend but with genera A and B showing regression lines and the R^2 values. Both sets of code above give me a legend with regression lines through all the symbols, not just A and B. As I explained above, I have tried the subset
function but it does not change the legend. When I tried
scatterplot <- ggplot(measurements, aes(x=Measurement_A, y=Measurement_B,
colour=Genus, shape=Genus))+
geom_point()+theme_classic()+
scale_color_manual(values = colour_specifications)+
scale_shape_manual(values= shape_specifications)+
geom_smooth(method=lm, se=FALSE)+
scale_color_manual(values = c("green","red"),limits = c("A","B"))+
stat_poly_eq(formula = y ~ x, eq.with.lhs=FALSE, parse = TRUE)
scatterplot
I get an error message saying "Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the existing scale"
and then I get two legends, one with the preset symbols but all in black for all genera, and one with the colours but just circular symbols for genera A and B.
ETA:
I have found the answer, I just needed to add show.legend=FALSE
in the geom_smooth
bracket.