3

I am trying to create spaghetti plots out of data consisting of numeric values over time. It is a large data, so cannot paste it here, but when trying

matplot(x,y,type="l",lty=1,col="#00000020")

This is the plot I get enter image description here

Ideally, I would like it to looks something like this enter image description here

how can I "smooth" the lines and make them less overlapping, like in the lower plot? If at all possible, without using ggplot2.

Oposum
  • 1,155
  • 3
  • 22
  • 38

1 Answers1

3

Just smooth the data in any number of ways, some examples

nr <- 200
mm <- t(matrix(sample(0:4, nr * 15, replace = TRUE), nr))
set.seed(1)
mm[sample(length(mm), nr * 15 / 20)] <- NA
x <- 1:15

par(mfrow = c(1,2))
matplot(mm, type = 'l', lty = 1, xlim = c(0,15), ylim = c(-5,10),
        col = adjustcolor('black', alpha.f = .1))
plot('mm', xlim = c(0,15), ylim = c(-5,10), panel.last = grid(), bty = 'l')
for (ii in 1:ncol(mm)) {
  dd <- data.frame(y = mm[, ii], x = x)
  lo <- loess(y ~ x, data = dd, na.action = 'na.omit')
  # lo <- loess(mm[, ii] ~ x)
  xl <- seq(min(x), max(x), (max(x) - min(x)) / 1000)
  lines(xl, predict(lo, xl), col = adjustcolor('black', alpha.f = .1))
}

enter image description here

Community
  • 1
  • 1
rawr
  • 20,481
  • 4
  • 44
  • 78
  • I only get a very few short lines this way. Could it be related to the fact that I have a lot of NA's in the data frame? But I should be able to produce at least as many smooth lines as I have regular overlapping lines. – Oposum Nov 01 '15 at 20:08
  • have no idea. it's always better to provide a reproducible example – rawr Nov 01 '15 at 21:10
  • My data is huge and cannot paste it. Would it be possible to randomly substitute around half of your numbers with NA's? Would it still work? – Oposum Nov 01 '15 at 22:58
  • My data is huge and cannot paste it. But I figured out what the problem is and can show you on your example. I can create similar problem (only few lines) I change your code following way `nr <- 50` and introduce NA's in `mm` in a following way: `mm[,c(4,7,12,13)]<-NA; mm[c(3,5,8,9,10,13),]<-NA` . Then the plot looks very similar to what I'm getting. – Oposum Nov 01 '15 at 23:11
  • @oposum just `mm[c(3,5,8,9,10,13),]<-NA` works fine. the other would be like having an observation with all `NA`s, there's no point in plotting those since you have no data, can't you just remove rows/columns that are all NA? – rawr Nov 02 '15 at 02:54
  • the problem is that not the tentire columns / rows are NA's. The example above was just to show you the effect of missing data, but in my case NA's are truly random. So removing complete rows or columns won't help. Is there a way to ignore NA's in each individual plot curve? It worked in matplot. – Oposum Nov 02 '15 at 04:07
  • try it with the edits, it still works to the example – rawr Nov 02 '15 at 04:17
  • with the edits, I get this message: `Error in predLoess(y, x, newx, s, weights, pars$robust, pars$span, pars$degree, : NA/NaN/Inf in foreign function call (arg 5)` . But I don't get this error message with your example, which has much less NA's than mine. – Oposum Nov 02 '15 at 04:45
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/94034/discussion-between-oposum-and-rawr). – Oposum Nov 02 '15 at 23:07