Since you have more than 1000 stores, the below demo might not be applicable directly but hopefully point you in the right direction.
You can analyze clustering of stores for a weekly basis or any another variant of weeks(N=4,8,...)
Here we look at store clusters at weekly frequency:
Test Data
library(dplyr) #data manipulation
library(ggdendro) #extracting clusters
library(ggplot2) #plotting
library(gridExtra) #for arranging ggplot graphs in grids
set.seed(42)
DF <- data.frame(weeks = rep(seq(1:5),5),
store = c(rep("A", 5), rep("B", 5), rep("C", 5),
rep("E", 5), rep("D", 5)),
sales = rnorm(25),
pp = rnorm(25))
weekInput = unique(DF$weeks)
Choosing frequency of weeks:
In the simple case, I have included discrete weeks i.e. 1,2,3,4 and 5.
Suppose you wish to choose N = 4 as frequency of weeks i.e. weeks 1-to-4,5-to-8 etc, you can use following to create partitions of weekInput over the year and modify as per your requirements.
#weekFreq = 4
#StartWeek = 1
#EndWeek = 52
#startPoints=seq(StartWeek,EndWeek,weekFreq)
#endPoints= c(tail(startPoints,-1)-1,EndWeek)
#
#freqDF = data.frame(cbind(startPoints,endPoints))
#weekInput = lapply(1:nrow(freqDF),function(x) { z= freqDF[x,]; z=as.vector(as.matrix(z)) } )
#head(weekInput)
#[[1]]
#[1] 1 4
#
#[[2]]
#[1] 5 8
#
#[[3]]
#[1] 9 12
#
#[[4]]
#[1] 13 16
#
#[[5]]
#[1] 17 20
#
#[[6]]
#[1] 21 24
Plotting of clusters
Good resources for plotting various dendrograms are here
and here
For each week, we calculate distance matrix over numeric data and create hierarchical clusters and use ggdendro package for plotting and output a list of plot objects
plotList = lapply(weekInput,function(x) {
subsetWeek=DF %>%
group_by(weeks) %>%
filter(weeks==x) %>% #you could change this to `weeks %in% c(x[1],x[2])`
as.data.frame() %>% # x[1] and x[2] are start and end points of weekInput
select(-weeks) %>%
as.data.frame()
#For numeric features of data, compute the distance matrix and form hierarchical cluster
numericDF= subsetWeek[,sapply(subsetWeek,is.numeric)]
clustDF = hclust(dist(numericDF))
#You can choose to limit the clusters to N = n, as per your discretion
#clustDF = cutree(clustDF, 4)
clustDF$labels = subsetWeek$store
#Use functions from ggdendro package for extracting clusters for ease in plotting
clustDendro = as.dendrogram(clustDF)
dendroData = dendro_data(clustDendro,type="rectangle")
Labels = label(dendroData)
Labels$group <- c(rep("Area1", 2), rep("Area2", 2), rep("Area3", 1))
gPlot = ggplot(segment(dendroData)) +
geom_segment(aes(x=x,y=y,xend=xend,yend=yend)) +
geom_label(data=Labels,aes(label=label,x=x,y=0,label.size=5,colour=Labels$group,fontface="bold")) +
ggtitle(paste0("Store Clusters for Week:",x)) +
labs(color="Area Names\n")
gPlot = gPlot + theme(legend.title = element_text(face = "bold"))
return(gPlot)
})
Arranging Plots
The list of plot objects from above can be arranged as per requirement.
Useful link on more details are here
grid::grid.newpage()
grid::grid.draw(do.call(rbind,lapply(plotList,function(x) ggplotGrob(x))))
Single Week:

All Weeks
Due to single column arrangement, readability is slightly affected
