2

I have 1000 time-series vectors and wish to plot all of them on a ggplot2 graph with the x axis ranging from (1:1000). I also want to set the alpha relatively low to see the density of certain areas.

Is there a way to do this without 1000 geom_line statements?

john
  • 61
  • 1
  • 3

2 Answers2

9

Here is an example that demonstrates how to make a reproducible example in R, and shows how to plot 1000 lines with ggplot2:

library(ggplot2)
library(reshape2)

n = 1000
set.seed(123)

mat = matrix(rnorm(n^2), ncol=n)
cmat = apply(mat, 2, cumsum)
cmat = t(cmat)
rownames(cmat) = paste("trial", seq(n), sep="")
colnames(cmat) = paste("time", seq(n), sep="")

dat = as.data.frame(cmat)
dat$trial = rownames(dat)
mdat = melt(dat, id.vars="trial")
mdat$time = as.numeric(gsub("time", "", mdat$variable))


p = ggplot(mdat, aes(x=time, y=value, group=trial)) +
    theme_bw() +
    theme(panel.grid=element_blank()) +
    geom_line(size=0.2, alpha=0.1)

enter image description here

bdemarest
  • 14,397
  • 3
  • 53
  • 56
  • Terrific example! – ColorStatistics Jun 09 '20 at 10:17
  • Hi, I see that this is an old thread. But can you help me with changing the color of the above plot based on two groups. For example, I have about 500 "good" (green color) batches and 500 "bad" (red color) batches. I want to plot all 1000 in green and red hue. And also add a mean line (solid line) for both groups. – MasterShifu Aug 04 '20 at 18:23
1

I suggest you do it like this.

  1. Keep series in data variable as tibble in tibble. My getTimeVector function returns just such a tibble.
  2. Make a plot in a simple way by grouping the series group = series.
library(tidyverse)

getTimeVector = function(series) {
  m = sample(seq(-300,300, 50), size = 1,
             prob = c(.02, .02, .1, .2, .1, .02, .02,
                      .1, .25, .1, .02, .03, .02))
  s = sample(c(1, 2, 5, 10), size = 1, prob=c(.5, .3, .15, .05))
  tibble(
    x = 1:1000,
    y = rnorm(1000, m, s)
  )
}

df = tibble(series = 1:1000) %>% 
  mutate(data = map(series, getTimeVector)) %>% 
  unnest(data)

df %>% ggplot(aes(x, y, group=series))+
  geom_line(size=0.1, alpha=0.1)+
  theme_bw() +
  theme(panel.grid=element_blank())

enter image description here

This data organization gives you an additional benefit. In a very simple way, you will be able to make different calculations for each series. Look below:

fsum = function(data) tibble(
  meany = mean(data$y),
  mediany = median(data$y),
  sdy = sd(data$y)
)

df %>% filter(series<10) %>% group_by(series) %>% 
  nest() %>% 
  mutate(stat = map(data, fsum)) %>% 
  unnest(stat)

output

# A tibble: 9 x 5
# Groups:   series [9]
  series data                  meany mediany   sdy
   <int> <list>                <dbl>   <dbl> <dbl>
1      1 <tibble [1,000 x 2]>  100.    100.  1.00 
2      2 <tibble [1,000 x 2]> -200.   -200.  2.00 
3      3 <tibble [1,000 x 2]>  150.    150.  2.00 
4      4 <tibble [1,000 x 2]>  150.    150.  1.98 
5      5 <tibble [1,000 x 2]>  100.     99.9 0.988
6      6 <tibble [1,000 x 2]>   99.9    99.9 4.94 
7      7 <tibble [1,000 x 2]>   50.0    50.0 0.983
8      8 <tibble [1,000 x 2]> -150.   -150.  1.00 
9      9 <tibble [1,000 x 2]>   50.0    50.0 0.988

Update 1

In a comment to the first reply @MasterShifu asks for the possibility of tinting coloring. In my case it will look like this:

library(tidyverse)

getTimeVector = function(series) {
  m = sample(seq(-300,300, 50), size = 1,
             prob = c(.02, .02, .1, .2, .1, .02, .02,
                      .1, .25, .1, .02, .03, .02))
  s = sample(c(1, 2, 5, 10), size = 1, prob=c(.5, .3, .15, .05))
  q = ifelse(m>=100, sample(c("bad", "good"), size = 1, prob = c(.75, .25)),
             ifelse(m>=-100, sample(c("bad", "good"), size = 1, prob = c(.5, .5)),
                    sample(c("bad", "good"), size = 1, prob = c(.25, .75))))
  tibble(
    t = 1:1000,
    x = rnorm(1000, m, s),
    quality = q
  )
}



df = tibble(series = 1:1000) %>% 
  mutate(data = map(series, getTimeVector)) %>% 
  unnest(data)

df %>% ggplot(aes(t, x, group=series, color=quality))+
  geom_line(size=0.1, alpha=0.1)+
  theme(panel.grid=element_blank())

enter image description here

Marek Fiołka
  • 4,825
  • 1
  • 5
  • 20