0

sorry if this is a rookie question and the long post. Thank you in advance. So I have a dataset of 88250 rows 131 columns, rows are observations and columns are labels and variables (column 1:21 are labels characters and 21:131 are variables doubles). I was trying to use UMAP from UWOT library to visualise and later perform supervised training. Now the first thing I tried to do is to tune the parameters for the UMAP model, namely n_neighbors and min_dist. UMAP output will be a table of X and Y coordinations and I can attach them onto my data frame then plot them. Here are the codes for one set of parameter chosen and I could plot a scatter plot and convert it to a 2D density plot to visualise differences in different treatments, hence the facet_wrap.

library(uwot)
#define real data and labels
df.labels = df[,1:21]
df.data = df[,22:131]
#apply UMAP transformation
df.umap<-umap(df.data,n_sgd_threads = 0,n_trees = 500,n_neighbors=50,
          min_dist=0.2,pca=50,
          verbose = T)

df$UMAPX<- df.umap[,1]
df$UMAPY<- df.umap[,2]
library(ggplot2)

m<-ggplot(df, aes(x=UMAPX ,y=UMAPY))+
  geom_point()+
  scale_x_continuous(name = "UMAP_X-axis_coordinates")+
  scale_y_continuous(name = "UMAP_y-axis_coordinates")+
  theme(axis.text.x= element_blank())+
  theme(axis.text.y = element_blank())+
  theme(axis.line = element_line(colour = "black",
                            size = 0.1,
                            linetype = "solid"))+
  labs(title = "UMAP visulisaiton")


  #try 2d density plot and see some distribution

  m +
    geom_density_2d()+
    stat_density_2d(aes(fill=..level..), geom = "polygon")+
    scale_fill_gradient(low = "blue", high = "red")+
    facet_wrap(df.labels$treatmentsum~.)

Now I want to write loops to store all the umap results into a list, each list is the data frame with the UMAP X and Y coordinates corresponding to a test pair value of the parameters. This worked and I got my list.

 #attempt to perform grid search for hyperparameter tuning 

 #interate the grid, manually set
 #performance evaluation

 n_neighbors.test <-seq(1,100,20)
 min_dist.test <- seq(0.05,4,0.5)

 #creating a data frame containing all combinations of the grid

 hyper_grid <- expand.grid(n_neighbors=n_neighbors.test, min_dist=min_dist.test)

 #create an empty list to store the models

 models <- list()

 #excute the grid search

 for (i in 1:nrow(hyper_grid)) {
  # get value paris at row i
  n_neighbors <- hyper_grid$n_neighbors[i]
  min_dist <- hyper_grid$min_dist[i]

  #train a model and store it in the list
  models[[i]] <- umap(df.data,n_sgd_threads = 0,n_trees = 500)

 }

#integrating the x, y parameters from umap grid search into a list of dataframes for later   visualisation

para<-list()

for (i in 1:40) {
  df$UMAPX<- models[[i]][,1]
  df$UMAPY<- models[[i]][,2]
  para[[i]]<- cbind(df,df$UMAPX,df$UMAPY)
}

here it got stuck I want to loop this ggplot code with each dataframe in the list using each of the x=UMAPX ,y=UMAPY Aim to generate 40 plots of the 15 panel facet wrap of the pairs of n_neighbors and min_dist tested. I thought I can modify the previous ggplot piece into a function and use map to apply it to all things in the list para then to plot but the plot list is NULL, no error returns. And the later PDF file is empty/.

library(purrr)
plot<- map(para,function(i){
  for (i in 1:40) {
    ggplot(para[[i]], aes(x=UMAPX ,y=UMAPY))+
      geom_point()+
      scale_x_continuous(name = "UMAP_X-axis_coordinates")+
      scale_y_continuous(name = "UMAP_y-axis_coordinates")+
      theme(axis.text.x= element_blank())+
      theme(axis.text.y = element_blank())+
      theme(axis.line = element_line(colour = "black",
                                 size = 0.1,
                                 linetype = "solid"))+
      labs(title = "UMAP visulisaiton for model")+
      geom_density_2d()+
      stat_density_2d(aes(fill=..level..), geom = "polygon")+
      scale_fill_gradient(low = "blue", high = "red")+
      facet_wrap(df.labels$treatmentsum~.)

  }


})

pdf("plots.pdf")

for (i in 1:length(plot)) { 
  print(plot[[i]]) 
  } 
dev.off()
ML33M
  • 341
  • 2
  • 19
  • 1
    Para is already a list of length 40. Map passes each element of para to its function. Try removing the for loop in map, and replace `para[[i]]` with just `i`. – JohannesNE Jun 24 '20 at 06:01
  • Looping with ggplot is always kinda tricky. Maybe those previous posts will help you, also you might brake down your problem to a reproducible example, since it's easier to help you. Check [this](https://stackoverflow.com/a/54809084/9783433) answers and [this](https://stackoverflow.com/questions/15678261/ggplot-does-not-work-if-it-is-inside-a-for-loop-although-it-works-outside-of-it) question. Maybe it'll help. – mischva11 Jun 24 '20 at 08:09
  • 1
    @JohannesNE holly sh*tz it worked! All the plots are in the pdf! But now I have a new problem. sorry should have thought about this before hand. I need to figure out which plot is which among all the 40 plots generated. say Can I insert something in the ggplot title in the loop that something shows which ith plot this is? and even better to insert the parameter pairs of n_neighbors.test and min_dist.test values in the label? something like " labs(title = "UMAP visulisaiton for model [i]")" – ML33M Jun 24 '20 at 15:17
  • @mischva11 thank you. Yes these posts are in essence dealing the same problem – ML33M Jun 24 '20 at 15:18
  • @JohannesNE, and yes just by inserting " labs(title = "UMAP visulisaiton for model [i]")" in the ggplot function in the loop gives me 40 plots with the same title "UMAP visulisaiton for model [i]", haha. I thought a loop will replace all i with the actual number of the iteration – ML33M Jun 24 '20 at 15:27

1 Answers1

1

The answer to the original problem is in the comments. Replace para[[i]] with i.

To add a title to the plot:

One way would be to simultaneously map over para and the n_neighbors column of hyper_grid, and use that in the title. If I understand your code correctly, the following should work. Subsetting hyper_grid$n_neighbors with [1:40] may be unnecessary, if 40 is the total nrow of hyper_grid.

plot<- map2(para, hyper_grid$n_neighbors[1:40], function(param, n_neighbors){
      ggplot(param, aes(x=UMAPX ,y=UMAPY))+
      geom_point()+
      scale_x_continuous(name = "UMAP_X-axis_coordinates")+
      scale_y_continuous(name = "UMAP_y-axis_coordinates")+
      theme(axis.text.x= element_blank())+
      theme(axis.text.y = element_blank())+
      theme(axis.line = element_line(colour = "black",
                                 size = 0.1,
                                 linetype = "solid"))+
      labs(title = paste("UMAP visualization for model /w n_neighbors: ", n_neighbors))+
      geom_density_2d()+
      stat_density_2d(aes(fill=..level..), geom = "polygon")+
      scale_fill_gradient(low = "blue", high = "red")+
      facet_wrap(df.labels$treatmentsum~.)



})
JohannesNE
  • 1,343
  • 9
  • 14
  • Hi man, thank you for the response. I think we must have quite some time difference, lol, I posted that like 2 am my time. Hence, our replies are far apart in time. I have tried the code, and it work in a similar way. I think it's quite cleaver to use map2. And I have decided to generate individual plots instead of mushing them together. Also using this as a base, I can twick around for other ideas. Thank you :) – ML33M Jun 25 '20 at 20:18