1

Update: found an answer but I can't say I understand it:

https://stackoverflow.com/a/47498831/2192578

Original issue:

I have 2 plots that are exactly the same, with only 1 exception: the coloring (pictures below). How can I make it such that the paths are still connecting? I believe this happens because geom_path() is basically building 3 different plots and overlapping them. How can I color different traces of the same path instead?

For reference, this is the code I'm using basically:

demand %>% ggplot(aes(x = prices, y = quantities, color = active_segments)) + 
  geom_path()

vs

demand %>% ggplot(aes(x = prices, y = quantities) + 
  geom_path()

without coloring with coloring

Sample data frame:

structure(list(prices = c(210, 211.5, 213, 214.5, 216, 217.5, 
219, 220.5, 222, 223.5, 225, 226.5, 228, 229.5, 231, 232.5, 234, 
235.5, 237, 238.5, 240, 241.5, 243, 244.5, 246, 247.5, 249, 250.5, 
252, 253.5, 255, 256.5, 258, 259.5, 261, 262.5, 264, 265.5, 267, 
268.5, 270, 271.5, 273, 274.5, 276, 277.5, 279, 280.5, 282, 283.5, 
285, 286.5, 288, 289.5, 291, 292.5, 294, 295.5, 297, 298.5, 300, 
301.5, 303, 304.5, 306, 307.5, 309), quantities = c(1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 
0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 
0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.6, 0.25, 0.25, 0.25, 0.25, 0.25, 
0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 
0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 
0.25), Segment1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Segment2 = c(1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    Segment3 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
    1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), active_segments = c("All segments", 
    "All segments", "All segments", "All segments", "All segments", 
    "All segments", "All segments", "All segments", "All segments", 
    "All segments", "All segments", "All segments", "Segment 3 and 2", 
    "Segment 3 and 2", "Segment 3 and 2", "Segment 3 and 2", 
    "Segment 3 and 2", "Segment 3 and 2", "Segment 3 and 2", 
    "Segment 3 and 2", "Segment 3 and 2", "Segment 3 and 2", 
    "Segment 3 and 2", "Segment 3 and 2", "Segment 3 and 2", 
    "Segment 3 and 2", "Segment 3 and 2", "Segment 3 and 2", 
    "Segment 3 and 2", "Segment 3 and 2", "Segment 3 and 2", 
    "Segment 3 and 2", "Segment 3 and 2", "Segment 3 and 2", 
    "Segment 3 and 2", "Segment 3 and 2", "Segment 3 and 2", 
    "Segment 3 and 2", "Segment 3 and 2", "Segment 3", "Segment 3", 
    "Segment 3", "Segment 3", "Segment 3", "Segment 3", "Segment 3", 
    "Segment 3", "Segment 3", "Segment 3", "Segment 3", "Segment 3", 
    "Segment 3", "Segment 3", "Segment 3", "Segment 3", "Segment 3", 
    "Segment 3", "Segment 3", "Segment 3", "Segment 3", "Segment 3", 
    "Segment 3", "Segment 3", "Segment 3", "Segment 3", "Segment 3", 
    "Segment 3")), row.names = c(1L, 151L, 301L, 451L, 601L, 
751L, 901L, 1051L, 1201L, 1351L, 1501L, 1651L, 1801L, 1951L, 
2101L, 2251L, 2401L, 2551L, 2701L, 2851L, 3001L, 3151L, 3301L, 
3451L, 3601L, 3751L, 3901L, 4051L, 4201L, 4351L, 4501L, 4651L, 
4801L, 4951L, 5101L, 5251L, 5401L, 5551L, 5701L, 5851L, 6001L, 
6151L, 6301L, 6451L, 6601L, 6751L, 6901L, 7051L, 7201L, 7351L, 
7501L, 7651L, 7801L, 7951L, 8101L, 8251L, 8401L, 8551L, 8701L, 
8851L, 9001L, 9151L, 9301L, 9451L, 9601L, 9751L, 9901L), class = "data.frame")
Sebastian Rivas
  • 1,700
  • 2
  • 13
  • 15
  • Please [make the question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by providing the data in `demand` as plain text (_e.g._ using `dput()`). – neilfws May 12 '20 at 02:13
  • The problem is that the "demand" data frame has over 12k rows haha, let me see if I can get some good markers. Didn't know about this dput thing, pretty cool! – Sebastian Rivas May 12 '20 at 03:34
  • Also a suggestion: for large datasets, you can "pick" out a representative sample of rows through use of the `sample()` function. So for `demand`, you could sample 100 rows via: `dput(demand[sample(1:nrow(demand), 100),])`. – chemdork123 May 12 '20 at 04:57

1 Answers1

1

Interesting question, because it highlights how the aesthetics color= (or fill=) and group= work in ggplot. ggplot is written such that graphics are simple to create, and as such, it's doing a bit of guesswork and figuring out on its own based on your input.

Here's a sample dataset and simple path plot:

df <- data.frame(
  x=1:10,
  y=c(10, 10, 8, 8, 4, 4, 4, 4, 2, 2),
  grp=c('A','A','B','B','C','C','C','C','D','D')
)
p <- ggplot(df, aes(x,y)) + theme_bw()
p + geom_path()

enter image description here

Note that the plot above is identically produced when you input this code:

p + geom_line()

When you think about it, ggplot is drawing all those lines by connecting all points together. It is smart enough to know that the data supplied is all part of the same set and should have lines connecting all of it: ggplot would refer to this dataset as belonging to the same aesthetic group. You can reproduce something similar to what you are seeing as follows:

# method 1:
p + geom_line(aes(color=grp))

# method 2:
p + geom_path(aes(color=grp))

# method 3:
p +
    geom_line(data=df[which(df$grp=='A'),], aes(color=grp)) +
    geom_line(data=df[which(df$grp=='B'),], aes(color=grp)) +
    geom_line(data=df[which(df$grp=='C'),], aes(color=grp)) +
    geom_line(data=df[which(df$grp=='D'),], aes(color=grp))

enter image description here

I showed all three methods above to prove a point: the same thing is going on in each one, and the most descriptive method above illustrating what's going on with the line-drawing is method 3. Applying a color= aesthetic also has the effect of telling ggplot that you want to group the points according to df$grp, and connect lines based on that grouping. If you remove the color and want to draw the same lines, just use the group= aesthetic alone:

p + geom_path(aes(group=grp))

enter image description here

Great... but how do we fix it?

Now that you understand that color= and group= work in a similar way, what happens when you use them together in the same call? Looking at the above examples, we see two things are actually going on when we use color=:

  1. The color of the lines is adjusted to match df$grp

  2. The line connectivity is drawn according to df$grp

When you use both the group= and the color= aesthetic, the group= controls the connectivity of the lines and the color= aesthetic just controls the color. This means if we break it down, we want to color the lines based on df$grp, but we want to connect the lines based on the "whole dataset". Here, we have to specify color=group, but we can actually put nearly anything we want for the group= aesthetic... as long as it's all going to be the same no matter the observation:

# method 1 <-- this works
p + geom_path(aes(color=grp, group=1))

# method 2 <-- this works
p + geom_path(aes(color=grp, group='pasta'))

# method 3 <-- this returns an ERROR
p + geom_path(aes(color=grp, group=pasta))

enter image description here

Most people just use method 1 above (set group=1), but the point is that ggplot will apply "1" to every observation in the dataframe, and as long as 1 is 1 for all observations in your dataframe (hint: 1 is going to be 1), it will say they are all part of the same group. Likewise, "pasta" is always "pasta" for every observation. The third method does not work, because ggplot expects pasta (no quotes) is a variable/column in your dataframe.

Hope that illustrates why you are seeing that issue and also highlights the solution. The same relationship holds true with fill=, size=, etc... and group=. Your solution is to use color= to tell ggplot how to color the lines, but indicate that you want it to be "all part of the same group" when actually drawing the line and connecting points by supplying a group= aesthetic.

chemdork123
  • 12,369
  • 2
  • 16
  • 32