0

I have several sets of data stored in a data frame. For the sake of this question, I provide below a way to generate this data frame, but IRL, I only have the merged data frame, not the intermediate ones.

x <- seq.POSIXt(from = strptime("1970-01-01 00:00:00", format = "%Y-%m-%d %H:%M:%S"),
                to = strptime("1970-01-01 00:05:00", format = "%Y-%m-%d %H:%M:%S"),
                by = "10 sec")

x <- rep(x, each = 3)
y <- c()

set.seed(1)

for (i in 1:length(x)) {
  y <- c(y, runif(1, min = 0, max = i))
}

my.data.frame1 <- data.frame(x, y, data = as.factor("data1"))

y <- c()
for (i in 1:length(x)) {
  y <- c(y, runif(1, min = length(x) - i, max = length(x)))
}

my.data.frame2  <- data.frame(x, y, data = as.factor("data2"))

merged <- rbind(my.data.frame1, my.data.frame2)

ggplot(merged, aes(x, y, color = data)) + geom_point() + geom_line()

So for each type of data (data1 and data2), and for each date value on the x axis, I have 3 y values.

The plot looks (bad) like this:

enter image description here

What I want to do is to plot a geom_ribbon of the data but I don't know how to do it.

I first tried to extract the min and max values with an aggregate function as explained here for each time and build a new data frame without duplicate x values but couldn't make it work.

Can anyone help?

Edit:

The code I tried with aggregate is this one:

aggregate(y ~ x, data = merged, max)

(Same for the min). But this does not make the difference between the data1 set and the data2 set. I know I could subset, but I guess it can be done using the "by" argument. Just couldn't make it work.

Community
  • 1
  • 1
Ben
  • 6,321
  • 9
  • 40
  • 76
  • Your approach with `aggregate` sounds like a good one. Can you add the code you tried so we can help troubleshoot? – aosmith Aug 23 '16 at 14:27
  • Updated question :) – Ben Aug 23 '16 at 14:31
  • 1
    I think you want to aggregate by "data" and "x", which you can do by putting multiple variables on the RHS of the tilde: `aggregate(y ~ data + x, data = merged, max)`. Per [this answer](http://stackoverflow.com/a/12064297/2461552) you can get the min and max at the same time in the same data.frame (helpful for plotting) so try `aggregate(y ~ data + x, data = merged, FUN = function(x) c(min.y = min(x), max.y = max(x)))` – aosmith Aug 23 '16 at 14:37

1 Answers1

1

You were on the right track, and need to aggregate by both data and x instead of just x.

You can either calculate the min and max by group separately in two aggregate calls and then merge or do both at the same time. For the second approach you'll need an additional step to get the output of the two functions into separate columns.

my.new.df = aggregate(y ~ data + x, data = merged, FUN = function(x) c(min = min(x), max = max(x)))

# Get the min and max as separate columns
my.new.df = as.data.frame(as.list(my.new.df))

ggplot(my.new.df, aes(x, fill = data)) + 
    geom_ribbon(aes(ymin = y.min, ymax = y.max), alpha = 0.6)

You can also make the plot directly using stat = "summary" in geom_ribbon instead of making an aggregate dataset for plotting.

ggplot(merged, aes(x, y, fill = data)) + 
    geom_ribbon(alpha = 0.6, stat = "summary", fun.ymax = max, fun.ymin = min)
aosmith
  • 34,856
  • 9
  • 84
  • 118