1

I'm trying to make a plot, and show different colors when p > 0.5, but when I use the color aes, the line appears to be disconnected.

enter image description here

library(tidyverse)

data <- tibble(n = 1:365)

prob <- function (x) {
  pr <- 1
  for (t in 2:x) {
    pr <- pr * ((365 - t + 1) / 365) 
  }
  return(1 - pr)
}


data %>%
  mutate(prob = map_dbl(n, prob)) %>%
  filter(n < 100) %>%
  ggplot(aes(x = n, y = prob, color = prob > 0.5)) + geom_line() + 
  scale_x_continuous(breaks = seq(0,100,10))

Anyone knows why? Removing the color aes() provides an unique line.

Norhther
  • 545
  • 3
  • 15
  • 35

1 Answers1

2

This is because prob is a discrete variable and condition prob > 0.5 is splitting your data into two parts, with gap between them: the first half has max(prob) = .476 and the second half has min(prob) = .507. Hence, the (vertical) gap on the line plot is the gap between this numbers.

you can see it, if you filter modified data for values close to .5:

data %>%
  mutate(prob = map_dbl(n, prob)) %>%
  filter(n < 100) %>%
  filter(between(prob, .4, .6))

if we modify your example:

data2 <- data %>%
  mutate(prob = map_dbl(n, prob)) %>%
  filter(n < 100)

#bringing extremes closer together
data2$prob[22] <- .49999999999999
data2$prob[23] <- .50000000000001

data2 %>%
  ggplot(aes(x = n, y = prob, color = prob >= 0.5)) + geom_line() + 
  scale_x_continuous(breaks = seq(0,100,10))

The gap becomes significantly smaller:

Line plot with the gap

However, it is still present (mostly on horizontal level) - because x variable is also discrete

A simple way of fixing this is to add dummy aesthetic group = 1 inside aes(), which overrides default grouping by x variable.

data %>%
  mutate(prob = map_dbl(n, prob)) %>%
  filter(n < 100) %>%
#add 'group = 1' below
  ggplot(aes(x = n, y = prob, color = prob >= 0.5, group = 1)) + geom_line() + 
  scale_x_continuous(breaks = seq(0,100,10))

Line plot without the gap

RaV
  • 617
  • 1
  • 5
  • 11
  • Could you please explain what group = 1 does here? – Norhther Apr 05 '20 at 10:25
  • @Norhther there is an excelent explanation of this _dummy aesthetic_ under [this question](https://stackoverflow.com/questions/39878813/ggplot-geom-bar-meaning-of-aesgroup-1). Basically, the default `group` aesthetic is set to `x` variable. – RaV Apr 05 '20 at 10:57
  • I read the post before, but I still don't understand the default behaviour in geom_line – Norhther Apr 05 '20 at 18:49
  • 1
    @Norhther,I might be wrong, but as far, as I understand this: (1) if we supply only _x_ and _y_ variable, both discrete, ggplot recives task to draw a line 'through' them. However, (2) if we also supply _color_ variable, that splits _x_-_y_ pairs into two (color) groups, then we get two lines, since default option is `group = x`, and _x_ and _y_ are now splited by `color = y ~ n`. But, (3) if we also add `group = 1` aesthetic, we tell ggplot, that despite the color grouping, we still want to keep whole dataset as one, consistent group. – RaV Apr 06 '20 at 06:04