Different color in the same price plot

Question

I have to color the price plot differently based on the cluster given by the kmeans function. Consider this code:

library(tseries)

####

nomiequity <- "^IXIC" 
datastart  <- "2019-09-18"

nsdq.prices <- get.hist.quote(instrument  = nomiequity,    
                              compression = "d",         
                              start       = datastart, 
                              end         = "2020-12-31", 
                              retclass    = "zoo",
                              quote       = "AdjClose")

b<-kmeans(nsdq.prices,3)
c<-b$cluster
d<- merge(nsdq.prices, c)
e<-split(nsdq.prices, c)

plot(nsdq.prices, type="l", col="green", ylim=c(6000, 13000))
    lines(e[["2"]], type = "l", col="red")
    lines(e[["3"]], type = "l", col="blue")

The result is almost what I need to do, but I don't want to show those connection between same colors in diffent time.

jay.sf · Accepted Answer · 2021-02-24T13:57:09.870

1

The problem is that the clusters get merged in the lines. You could use the rle lengths to increase the number by one when the clusters change in the time series. For this use Map to repeat consecutive numbers l times. Then you are able to split on these growing numbers but use cluster to define the color of the lines. For the latter use lapply to loop over the splitted e.

cl <- kmeans(nsdq.prices, 3)$cluster
l <- rle(as.numeric(cl))$lengths
s <- Map(rep, seq(l), l)
e <- split(cbind(nsdq.prices, cl), unlist(s))

plot(nsdq.prices, type="l", col=7, ylim=c(6000, 13000))
invisible(lapply(e, function(x) lines(x$Adjusted, col=x$cl + 1)))
legend("topleft", leg=c(sprintf("cl %s", 1:3), "missing"), col=c((1:3)+1, 7), lty=1)

Where there's no date defined there appear gaps. We could use the zoo interpolations by overplotting the original plot using a "missing" color.

edited Feb 24 '21 at 13:57

answered Feb 24 '21 at 12:33

jay.sf

60,139
8
53
110

Thank you jay.sf! So there's no way to keep the original shape of the chart without separating the various parts, right? – Dani.Tav Feb 24 '21 at 13:36
@Dani.Tav Probably not, since a line is one object and needs to be separated when needing different colors. – jay.sf Feb 24 '21 at 13:42
@Dani.Tav Or, if you mean the gaps, I slightly changed the code to get a continuous line. – jay.sf Feb 24 '21 at 13:57
Oh I see. I think I'll work on this solution. Thank you! – Dani.Tav Feb 24 '21 at 14:37

score 0 · Answer 2 · answered Feb 24 '21 at 11:18

I hope this piece of code helps you:

library(tseries)

####

nomiequity <- "^IXIC" 
datastart  <- "2019-09-18"

nsdq.prices <- get.hist.quote(instrument  = nomiequity,    
                              compression = "d",         
                              start       = datastart, 
                              end         = "2020-12-31", 
                              retclass    = "zoo",
                              quote       = "AdjClose")

b<-kmeans(nsdq.prices,3)
c<-b$cluster
d<- merge(nsdq.prices, c)
e<-split(nsdq.prices, c)


e3= e[['3']]
empty <- zoo(order.by=seq.Date(head(index(e3),1),tail(index(e3),1),by="days"))
e3=merge(e3,empty)


e2= e[['2']]
empty <- zoo(order.by=seq.Date(head(index(e2),1),tail(index(e2),1),by="days"))
e2=merge(e2,empty)


plot(nsdq.prices, type="l", col="green", ylim=c(6000, 13000))
lines(e2, type = "l", col="red")
lines(e3, type = "l", col="blue")

I just used this: R: Filling missing dates in a time series?

Thank you LocoGris! But the generated plot is the same as mine. The problem is that in the same "line" observations should be coloured differently without showing connections between non-adiacent observations of the same cluster. — Dani.Tav, Feb 24 '21 at 11:53
Is geom_point not easier for visualisation in this case? ggplot(d, aes(x = Index, y = Adjusted, col = as.factor(c))) + geom_point() + scale_color_manual(values = c("green", "red", "blue")) — maarvd, Feb 24 '21 at 11:56
Thank you maarvd! This might be a good solutions, but there is a way to use lines instead of points? — Dani.Tav, Feb 24 '21 at 12:02

score 0 · Answer 3 · answered Feb 24 '21 at 13:38

With this methodology I found the rows where cluster value changes, then selected those dates and added NA values for adjusted. geom_line breaks when NA values occur, hence giving the desired result.

library(ggplot2)
library(dplyr)
library(data.table)

#add date as a column
datum <- index(d)
d <- as.data.table(d)
d$Datum <- datum

#add row number as a column
d$Row <- 1:nrow(d)

#find rows of d where cluster value changes
rows <- which(d$c != dplyr::lag(d$c))
rows <- d[Row %in% rows]

#add NA values for Adjusted at the Dates where values change(which breaks the geom_line)
rows <- rows[, Adjusted := NA]

#merge rows (NA values of Adjusted) with d
d <- rbind(rows, d)

#create a plot
ggplot(d, aes(x = Datum, y = Adjusted, col = as.factor(c))) + geom_line() + scale_color_manual(values = c("green", "red", "blue"))

Different color in the same price plot

3 Answers3