I've got my clusterisation done, now, I want to use it to replace missing values. My idea is to compute a representative for each cluster then replace missing values according to that representative. The problem is... I don't really know how to do that.
I searched about it and found this question, which seems to almost answer my issue (finding a representative would also work for me), but I don't understand enough of it to use it.
library(data.table)
library(dplyr)
library(tidyr)
library(TSclust)
set.seed(1)
df = data.table(
"Time" = c(1,2,3,4,5),
"1" = runif(5),
"2" = runif(5),
"3" = runif(5),
"4" = runif(5),
"5" = runif(5),
"6" = runif(5))
clusters = hclust(diss(ts(df[,-1]), "EUCL"))
tree = cutree(clusters, 3)
rep = df%>%
gather(key = ID,value = Conso, -Time)%>%
mutate(Cluster = as.vector(sapply(tree, FUN = rep,times = 5)))%>%
group_by(Cluster, Time)%>%
summarise(Conso = mean(Conso))
Here's something close to my actual data, and here's some naive way to compute some representatives.
Is this actually an ok way to do it ? Do you know a way to extract those representatives from clusters ?