1

I have the dataframe below which I process properly in order to create a cluster scatter plot with:

library(tidyverse)  # data manipulation
library(cluster)    # clustering algorithms
library(factoextra) # clustering algorithms & visualization
library(plotly)
df <- USArrests
df <- na.omit(df)

df <- scale(df)
distance <- get_dist(df)

k2 <- kmeans(df, centers = 2, nstart = 25)
df %>%
  as_tibble() %>%
  mutate(cluster = k2$cluster,
         state = row.names(USArrests))
p2<-fviz_cluster(k2, data = df, geom="point")
#+ scale_fill_discrete(name = "Cluster", labels = c("1", "2", "3","4"))
p2
ggplotly(p2)

When I use ggplotly() then the legend names change so Im looking for a way to set them manually or even hide the legend at all.

firmo23
  • 7,490
  • 2
  • 38
  • 114

1 Answers1

2

the easiest way I came a cross is renaming the label within the object.

p2<-fviz_cluster(k2, data = df, geom="point")

p3 <- ggplotly(p2)

p3[["x"]][["data"]][[2]][["name"]] <- "2"
p3

It's not pretty but helps in the short term.

Edit: so there was more than one question First: is about legend labels Second: about interactive points in the plot # Most of the example code was given, # only change of center variable

# Example
library(tidyverse)  # data manipulation
library(cluster)    # clustering algorithms
library(factoextra) # clustering algorithms & visualization
library(plotly)
df <- USArrests
df <- na.omit(df)

df <- scale(df)
distance <- get_dist(df)

# added center variable for number of centers in kmeans
# this will also be used to select elemnets from ggplot or ggplotly later

centers=4
k2 <- kmeans(df, centers = centers, nstart = 25)
df %>%
  as_tibble() %>%
  mutate(cluster = k2$cluster,
         state = row.names(USArrests))

p2<-fviz_cluster(k2, data = df, geom="point")

p2
p3 <- ggplotly(p2)

# Solution
# First Problem: Changing legend labels 
# Because the transition from ggplot to ggplotly
#   messes up multiple scales like here (color and shape)
# Why it looks like intended when only changing the point layer, 
#   I don't know

for (i in 1:centers) {
  p3[["x"]][["data"]][[i]][["name"]] <- i
}

# Second Problem: interactive points
# ggplot saves the data in one list and ggplotly splits the data 
#    depending on layer and cluster
# for the labels it is enough to change the point layers 
#    (the first x depending on num. of centers)
# to add more inforamtion to labels 
#   manipulate the variable names_states with html
for (i in 1:centers) {
  name_states <- p2[["data"]]%>%
    filter(cluster==i)%>%
    select(name)

  p3[["x"]][["data"]][[i]][["text"]] <- as.vector(name_states$name)
}

# Changing order of layers because polygon-layer is on top and 
#    makes it impossible to hover over points beneeth
p3[["x"]][["data"]] <- p3[["x"]][["data"]][(centers*3):1]

# Now you can hover over every point and can see the state name
p3

result

  • Interesting but how could this work with 4 clusters (centers=4) instead of 2? I replace 2 with 4 everywhere and its wrong again – firmo23 Aug 26 '19 at 21:15
  • 1
    Well, sorry that this more a fix than a solution. [https://github.com/ropensci/plotly/issues/1164] They seem to have a similar problem. If I understand them correct its because `fviz_cluster()` - produces multiple scales and `ggplolty()` can't handle them. [https://stackoverflow.com/questions/49133395/strange-formatting-of-legend-in-ggplotly-in-r] this seems similar, he suggest build int directly with `plotly`. – Johannes Stötzer Aug 26 '19 at 22:00
  • Hmm I see. The issue is that I cannot find the same plot directly with plotly. Your answer above would be great it it could work with 4 clusters – firmo23 Aug 26 '19 at 22:06
  • If you want to change it manually, look into the `View(p3[["x"]][["data"]])`. There you see the object with all the layers. There you see 6 lists when you have 2 centers and 12 when 4. I tried your code with 4 centers and changed p3[["x"]][["data"]][[4]][["name"]] <- "4" and it worked like before. – Johannes Stötzer Aug 26 '19 at 22:07
  • are u sure? It gives me names:1,(2,1)(3,1),4 – firmo23 Aug 26 '19 at 22:12
  • wow u re awesome just saw your edit. So this https://stackoverflow.com/questions/57663740/manually-set-the-legend-names may have the same logic? – firmo23 Aug 26 '19 at 22:31
  • 1
    Sure, I worked with your example data and code. Or what do you mean specifically? – Johannes Stötzer Aug 26 '19 at 22:48
  • I would like to find a way to set manually the hover info text in the Q above in order to display the rownames (states) – firmo23 Aug 26 '19 at 22:52
  • 1
    It is probably possible. 'p3[["x"]][["data"]][[1]][["text"]]' there you can see what goes into the hover text, you could replace it with just the state names. – Johannes Stötzer Aug 26 '19 at 23:08
  • p3[["x"]][["data"]][[1]][["text"]]<-rownames(USArrests) does not change something in the plot – firmo23 Aug 26 '19 at 23:30
  • would be better if we could continue the conversation in the respective Question link I attached a few comments before – firmo23 Aug 27 '19 at 08:41
  • your suggestion works if you put is as answer in the respective Q I ll accept it!THANKS – firmo23 Aug 29 '19 at 13:29
  • No it is one question. I meant that you could post your answer here and I ll accept it regarding the legend names – firmo23 Aug 30 '19 at 23:49