2

I am looking to plot smoothed lines of subsets of a dataset on top of an overall plot of the data.

For example in the image below, I would be looking to plot the green and blue datapoints on the right as a separate geom_smooth line. Full data plot

A reproducible example of the code would be as follows:

library(datasets)
library(ggplot2)
library(tidyr)

iris_subset <- subset(iris, Species == "virginica" | Species == "versicolor")

p <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width, colour = Species))
p <- p + geom_point()
p <- p + geom_smooth(method="loess", colour = "black")
p <- p + geom_smooth(aes(data = iris_subset, x = Sepal.Length, y = Sepal.Width), method="loess", colour = "red")
print(p)

When I attempt this however, it throws the error "Error: Aesthetics must be either length 1 or the same as the data"

This seems to imply that it doesn't like the inclusion of a subset of different length plotted on the same axis as the original, but after playing with it I've hit a wall.

F. Windram
  • 91
  • 5
  • 1) always try to give https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – tjebo Jul 03 '18 at 15:17
  • 1
    2) avoid using `$` in `aes – tjebo Jul 03 '18 at 15:17
  • 3) you can subset your data in the "data" argument of each function, which you currently do not do – tjebo Jul 03 '18 at 15:18
  • 4) or specify groups with color/ fill/ group aesthetics or using facets – tjebo Jul 03 '18 at 15:20
  • 1
    Last but not least: Possible duplicate of [Subset and ggplot2](https://stackoverflow.com/questions/18165578/subset-and-ggplot2) – tjebo Jul 03 '18 at 15:22
  • Firstly thanks for the tips. I suspected that using `$` was bad practice but just hadn't really gone through to change it. I will update with a more reproducible example. Having taken a look, this isn't really a duplicate. The issue being that in the linked duplicate, P1 and P3 are individual sets. In my scenario, one is a direct subset of the other. So I cannot group by a variable, as that would exclude some of the data from the first plot (a plot of all data). – F. Windram Jul 03 '18 at 15:58
  • I doubt that you cannot group by a variable... well have a look at @Nicolas Velasquez suggestion and/or update with a reproducible example. Let me know when done and I might have a look – tjebo Jul 03 '18 at 16:00
  • I am pretty sure you could group by variable . Specifically, you could still plot all your data in one geom_x() and a subset of your data in a second geom_x() without problems. – Nicolás Velasquez Jul 03 '18 at 16:03
  • 1
    Problem solved, I was trying to source data inside the `aes()` brackets. Thanks for all the help. I'll write it up now. – F. Windram Jul 03 '18 at 16:19
  • Just put `data = iris.subset` outside of aes(). – Carlos Eduardo Lagosta Jul 03 '18 at 16:33

1 Answers1

2

Just for future reference, here I was trying to call data=subset from inside aes().

The solved code is as follows:

library(datasets)
library(ggplot2)
library(tidyr)

iris_subset <- subset(iris, Species == "virginica" | Species == "versicolor")

p <- ggplot(data = iris, aes(x = Sepal.Length, y = Sepal.Width))
p <- p + geom_point(aes(colour = Species))
p <- p + geom_smooth(method="loess", colour = "black")
p <- p + geom_smooth(data = iris_subset, aes(x = Sepal.Length, y = Sepal.Width), method="loess", colour = "red")
print(p)

Which gives the following output (as intended). enter image description here

F. Windram
  • 91
  • 5