2

I have to color the price plot differently based on the cluster given by the kmeans function. Consider this code:

library(tseries)

####

nomiequity <- "^IXIC" 
datastart  <- "2019-09-18"

nsdq.prices <- get.hist.quote(instrument  = nomiequity,    
                              compression = "d",         
                              start       = datastart, 
                              end         = "2020-12-31", 
                              retclass    = "zoo",
                              quote       = "AdjClose")

b<-kmeans(nsdq.prices,3)
c<-b$cluster
d<- merge(nsdq.prices, c)
e<-split(nsdq.prices, c)

plot(nsdq.prices, type="l", col="green", ylim=c(6000, 13000))
    lines(e[["2"]], type = "l", col="red")
    lines(e[["3"]], type = "l", col="blue")

enter image description here The result is almost what I need to do, but I don't want to show those connection between same colors in diffent time.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
Dani.Tav
  • 37
  • 4

3 Answers3

1

The problem is that the clusters get merged in the lines. You could use the rle lengths to increase the number by one when the clusters change in the time series. For this use Map to repeat consecutive numbers l times. Then you are able to split on these growing numbers but use cluster to define the color of the lines. For the latter use lapply to loop over the splitted e.

cl <- kmeans(nsdq.prices, 3)$cluster
l <- rle(as.numeric(cl))$lengths
s <- Map(rep, seq(l), l)
e <- split(cbind(nsdq.prices, cl), unlist(s))

plot(nsdq.prices, type="l", col=7, ylim=c(6000, 13000))
invisible(lapply(e, function(x) lines(x$Adjusted, col=x$cl + 1)))
legend("topleft", leg=c(sprintf("cl %s", 1:3), "missing"), col=c((1:3)+1, 7), lty=1)

enter image description here

Where there's no date defined there appear gaps. We could use the zoo interpolations by overplotting the original plot using a "missing" color.

jay.sf
  • 60,139
  • 8
  • 53
  • 110
0

I hope this piece of code helps you:

library(tseries)

####

nomiequity <- "^IXIC" 
datastart  <- "2019-09-18"

nsdq.prices <- get.hist.quote(instrument  = nomiequity,    
                              compression = "d",         
                              start       = datastart, 
                              end         = "2020-12-31", 
                              retclass    = "zoo",
                              quote       = "AdjClose")

b<-kmeans(nsdq.prices,3)
c<-b$cluster
d<- merge(nsdq.prices, c)
e<-split(nsdq.prices, c)


e3= e[['3']]
empty <- zoo(order.by=seq.Date(head(index(e3),1),tail(index(e3),1),by="days"))
e3=merge(e3,empty)


e2= e[['2']]
empty <- zoo(order.by=seq.Date(head(index(e2),1),tail(index(e2),1),by="days"))
e2=merge(e2,empty)


plot(nsdq.prices, type="l", col="green", ylim=c(6000, 13000))
lines(e2, type = "l", col="red")
lines(e3, type = "l", col="blue")

I just used this: R: Filling missing dates in a time series?

LocoGris
  • 4,432
  • 3
  • 15
  • 30
  • Thank you LocoGris! But the generated plot is the same as mine. The problem is that in the same "line" observations should be coloured differently without showing connections between non-adiacent observations of the same cluster. – Dani.Tav Feb 24 '21 at 11:53
  • 1
    Is geom_point not easier for visualisation in this case? ggplot(d, aes(x = Index, y = Adjusted, col = as.factor(c))) + geom_point() + scale_color_manual(values = c("green", "red", "blue")) – maarvd Feb 24 '21 at 11:56
  • Thank you maarvd! This might be a good solutions, but there is a way to use lines instead of points? – Dani.Tav Feb 24 '21 at 12:02
0

With this methodology I found the rows where cluster value changes, then selected those dates and added NA values for adjusted. geom_line breaks when NA values occur, hence giving the desired result.

library(ggplot2)
library(dplyr)
library(data.table)

#add date as a column
datum <- index(d)
d <- as.data.table(d)
d$Datum <- datum

#add row number as a column
d$Row <- 1:nrow(d)

#find rows of d where cluster value changes
rows <- which(d$c != dplyr::lag(d$c))
rows <- d[Row %in% rows]

#add NA values for Adjusted at the Dates where values change(which breaks the geom_line)
rows <- rows[, Adjusted := NA]

#merge rows (NA values of Adjusted) with d
d <- rbind(rows, d)

#create a plot
ggplot(d, aes(x = Datum, y = Adjusted, col = as.factor(c))) + geom_line() + scale_color_manual(values = c("green", "red", "blue"))

enter image description here

maarvd
  • 1,254
  • 1
  • 4
  • 14