0

I have a date frame with a binary outcome of interest, y, a date series and a grouping variable, like the example below.

date <- c("2000-05-01", "2000-05-01", "2000-05-01", "2000-05-02", "2000-05-02", "2000-05-02", "2000-05-02", "2000-05-03", "2000-05-03", "2000-05-03", "2000-05-04", "2000-05-04")
y <- c("1", "0", "0", "0","1","1","0", "1","1","0", "1","0")
group <- c("1", "2", "3", "2", "1", "1", "2", "3", "2", "1", "1", "3")
df <- as.data.frame(cbind(date, y, group))

From this, I would like to plot [EDIT] in a line chart [/EDIT] the proportion of y = 1 (on the y-axis) over time (on the x-axis) by group. (The actual data frame contains more than thousand observations per group, so the line will make sense, unlike in this example. ;) )

Preferably, I would like to do this with the built-in plotting functionalities of R but if-need-be also with ggplot2.

Other similar questions e.g. here have been answered with solutions that aren't feasible for me (wrong plot), so I'm a bit lost and would appreciate help!

Ivo
  • 3,890
  • 5
  • 22
  • 53

1 Answers1

2

One way would be to precalculate the proportion and plot it using geom_line:

library(tidyverse)
df %>%
  mutate(date = as.POSIXct(date)) %>% #convert date to date
  group_by(group, date) %>% #group
  summarise(prop = sum(y=="1")/n()) %>% #calculate proportion 
  ggplot()+
  geom_line(aes(x = date, y = prop, color = group))+
  geom_point(aes(x = date, y = prop, color = group))

enter image description here

Answer to the updated question in the comments:

df %>%
  mutate(date = as.POSIXct(date)) %>% #convert date to date
  group_by(group, date) %>% #group
  summarise(prop = sum(y=="1")/n()) %>%
  ggplot()+
  geom_line(aes(x = date, y = prop, color = group))+
  geom_point(aes(x = date, y = prop, color = group))+
  geom_vline(xintercept = as.POSIXct("2000-05-03 CEST"))

enter image description here

missuse
  • 19,056
  • 3
  • 25
  • 47
  • 1
    I'm really sorry, this seems to have been unclear but I'm looking for a line chart. – Ivo Mar 10 '18 at 16:14
  • 1
    If anyone wants to use this: this solution requires the packages `magrittr` and `dplyr`. – Ivo Mar 12 '18 at 10:11
  • 1
    I get an `Error in summarise_impl(.data, dots): Evaluation error: object 'y' not found.` Is there anything I'm doing wrong? – Ivo Mar 12 '18 at 10:18
  • 2
    Added needed library in the edit. Thanks. Do you have a column called `y` in your `df`? – missuse Mar 12 '18 at 10:28
  • 1
    Ok, that was it--didn't realise the first `y` in your code referred to my variable, will label better in the future. Thank you for your help! [My graph](https://www.dropbox.com/s/ncqs986hejrw9x8/Testplot.png?dl=0) looks a bit weird: the lines are vertical and seem to not connect the dots of a group over time. However, when turning off the lines [(s. here)](https://www.dropbox.com/s/sjw1et5x7wii1c3/Testplot_2.png?dl=0), the dots are fine. I'm not sure if this is a problem of my data, then? (And if so, what it might be.) – Ivo Mar 12 '18 at 11:49
  • 3
    `df %>% mutate(group = factor(group)) %>%` Add this line instead of `df %>%` it should work. You groups are numerical, that is causing the issue. Post an update please. – missuse Mar 12 '18 at 11:59
  • On this issue: I'm trying to introduce a vertical line using `abline(v=...)` but: does `R` accept that? And, if so: I'm not sure about the value for `v` here: is it possible to get the value for a specific date (here: `2005-05-03` for example) or find out which absolute number corresponds to this for `abline`? (I'm aware this is in addition to the initial question but don't think it justifies opening a new question.) – Ivo Mar 12 '18 at 13:05
  • No, I'm trying to add a vertical line at a certain date. My assumption was that this would be done using `abline()` but I might be incorrect and am not sure how to combine it with the plots you pointed out. (For context: the idea is that an external even happened for one of the two groups, and I want to visualize that event with one vertical line on the date of the event.) – Ivo Mar 12 '18 at 13:27