1

I am trying to run HDBSCAN algortihm in R via largeVis package. For visualization of clusters. I am using gplot function in largeVis. Is it possible to change the labels of my data points in the plot from integers to string? I am using Iris dataset with little modification in "class" column and using "class" column as row headers. Is it possible to visualize my current row headers in the plot instead of node numbers?

x1 <- iris[,-5]
row.names(x1) <- paste0("Iris-", iris[,5], " ", 1:nrow(x1))
View(x1)

Iris_modified row headers

vis <- largeVis::largeVis(x1)
clustering <- largeVis::hdbscan(vis)
largeVis::gplot(clustering,t(vis$coords), text = TRUE)

enter image description here

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Div Trivedi
  • 91
  • 2
  • 8
  • It's easier to help you if you proved a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Pictures of data aren't helpful. – MrFlick Mar 29 '17 at 19:46
  • I am not sure how to provide an example which would generate data consisting of clusters. Will creating a data frame with 10 columns and 1000 rows with first row as column headers and first column as row headers, and then randomly generating numbers will help? – Div Trivedi Mar 29 '17 at 20:21
  • Often the help pages for functions have examples that use built in data sets. Usually it's best to adapt those to recreate your problem. – MrFlick Mar 29 '17 at 20:33
  • @MrFlick Edited the question with Iris Dataset. Hope this is reproducible problem. – Div Trivedi Apr 05 '17 at 20:21

1 Answers1

0

The function itself doesn't have an easy option to plot the rownames, bit it does return a ggplot object and you can add additional layers to that. Here's how you can plot with the rownames

library(ggplot2)
pp <- largeVis::gplot(clustering,t(vis$coords), text = FALSE) + 
  geom_label(aes(label=rownames(x1)[label+1]), size=2.5, label.size=0.1, alpha=0.7)

Internally it builds a data.frame and indexes each node start at (for some very non R-like reason) 0. We can use that index to look up the rowname for that observations and use that as a label. Here I kept most of the styling used by the default options in the base functions.

enter image description here

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • My guess is that you can change the colors the way you would with any other ggplot object, with `scale_color_manual()` or something like that. – MrFlick Jun 30 '17 at 16:52