4

I am attempting to create an interactive plot of hexbins where a user can click on a given hexbin and receive a list of all observations of the original data frame that were grouped in that clicked hexbin.

Below is a MWE that seems pretty close to my goal. I am using Shiny, hexbin(), and ggplotly.

app.R

library(shiny)
library(plotly)
library(data.table)
library(GGally)
library(reshape2)
library(hexbin)

ui <- fluidPage(
  plotlyOutput("plot"),
  verbatimTextOutput("click")
)

server <- function(input, output, session) {
  #Create data
  set.seed(1)
  bindata <- data.frame(x=rnorm(100), y=rnorm(100))

  h <- hexbin (bindata, xbins = 5, IDs = TRUE, xbnds = range (bindata$x), ybnds = range (bindata$y))

  # As we have the cell IDs, we can merge this data.frame with the proper coordinates
  hexdf <- data.frame (hcell2xy (h),  ID = h@cell, counts = h@count)

  # I have tried different methods of generating the ggplot object
  #p <- ggplot(hexdf, aes(x=x, y=y, fill = counts)) + geom_hex(stat="identity")
  #p <- ggplot(hexdf, aes(x=x, y=y, fill = ID)) + geom_hex(stat="identity")
  #p <- ggplot(hexdf, aes(x=x, y=y, fill = counts, colours = ID)) + geom_hex(stat="identity")
  #p <- ggplot(hexdf, colours = ID, aes(x=x, y=y, colours = ID, fill = counts)) + geom_hex(stat="identity")
  p <- ggplot(hexdf, aes(x=x, y=y, fill = counts, ID=ID)) + geom_hex(stat="identity")

  output$plot <- renderPlotly({
    ggplotly(p)
  })

  d <- reactive(event_data("plotly_click"))

  output$click <- renderPrint({
    if (is.null(d())){
      "Click on a state to view event data"
    }
    else{
      str(d())
      #Next line would deliver all observations from original data frame (bindata) that are in the clicked hexbin... if d() from event_data() was returning ID instead of curveNumber
      #bindata[which(h@cID==d()$curveNumber),]
    }
  })
}

shinyApp(ui, server)

Inside the h@cID object is the ID for all data points (showing which data point goes into which hexbin). Hence, I feel that if I am able to get event_data() to return the hexbin ID when a user clicks, then I should be able to successfully map that hexbin ID back to the h@cID object to obtain the corresponding data points.

Unfortunately, the way I have it written currently, event_data() will return "curveNumber" which does not seem to equal ID. It also does not seem to translate into ID (even when using all the information in the h object - not just h@cID, but also more such as h@xcm, h@ycm, etc.)

Is there any method anyone is aware of to solve this type of problem? Any ideas would be appreciated!

Note: My two most recent posts (including a bounty) are very similar to this question. They are located here (Interactive selection in ggplotly with geom_hex() scatterplot) and (Obtain observations in geom_hex using plotly and Shiny). The difference is that I have been making the problem more simple each step. Thank you.

Edit - Possible Answer

I think I may have obtained the solution for this problem. Like what @oshun noticed, there is some hidden conversion between the curveNumber returned from event_data() and the hexbin ID. It seems that curveNumbers are first sorted from smallest to largest by increasing count of the hexbins. Then, within a given count, it seems that curverNumber is further sorted from smallest to largest by increasing ID. However, the ID is sorted by character (not number). For instance, the number 18 would be considered smaller than the number 2 because 18 starts with the digit 1 which is smaller than the digit 2.

You can see this pattern when the full dataset in this example is represented with count, ID, and curveNumber below:

count=1 (ID=24) —> curveNumber 0
count=1 (ID=26) —> curveNumber 1
count=1 (ID=34) —> curveNumber 2
count=1 (ID=5) —> curveNumber 3
count=1 (ID=7) —> curveNumber 4
count=2 (ID=11) —> curveNumber 5
count=2 (ID=14) —> curveNumber 6
count=2 (ID=19) —> curveNumber 7
count=2 (ID=23) —> curveNumber 8
count=2 (ID=3) —> curveNumber 9
count=2 (ID=32) —> curveNumber 10
count=2 (ID=4) —> curveNumber 11
count=3 (ID=10) —> curveNumber 12
count=3 (ID=13) —> curveNumber 13
count=3 (ID=33) —> curveNumber 14
count=3 (ID=40) —> curveNumber 15
count=3 (ID=9) —> curveNumber 16
count=4 (ID=17) —> curveNumber 17
count=4 (ID=20) —> curveNumber 18
count=5 (ID=28) —> curveNumber 19 
count=5 (ID=8) —> curveNumber 20
count=6 (ID=21) —> curveNumber 21 
count=8 (ID=27) —> curveNumber 22 
count=9 (ID=22) —> curveNumber 23 
count=11 (ID=16)—> curveNumber 24
count=14 (ID=15)—> curveNumber 25

Below is my tentative solution for this problem. I am pretty sure it works for this this dataset, but I plan to test it on more datasets to be sure.

app.R

library(shiny)
library(plotly)
library(data.table)
library(GGally)
library(reshape2)
library(hexbin)

ui <- fluidPage(
  plotlyOutput("plot"),
  verbatimTextOutput("click")
)

server <- function(input, output, session) {

  # Curve number to ID
  cnToID <- function(h){
    df <- data.frame(table(h@cID))
    colnames(df) <- c("ID","count")
    cnID <- df[order(df$count,as.character(df$ID)),]
    cnID$curveNumber <- seq(0, nrow(cnID)-1)
    return(cnID)
  }

  # Create data
  set.seed(1)
  bindata <- data.frame(x=rnorm(100), y=rnorm(100))
  h <- hexbin (bindata, xbins = 5, IDs = TRUE, xbnds = range (bindata$x), ybnds = range (bindata$y))
  hexdf <- data.frame (hcell2xy (h),  ID = h@cell, counts = h@count)
  p <- ggplot(hexdf, aes(x=x, y=y, fill = counts, ID=ID)) + geom_hex(stat="identity")
  #p <- ggplot(hexdf, aes(x=x, y=y, fill = counts), ID=ID) + geom_hex(stat="identity")
  cnID <- cnToID(h)

  output$plot <- renderPlotly({
    p2 <- ggplotly(p)
    for (i in 1:nrow(hexdf)){
      p2$x$data[[i]]$text <- gsub("<.*$", "", p2$x$data[[i]]$text)
    }
    p2
  })

  d <- reactive(event_data("plotly_click"))

  output$click <- renderPrint({
    if (is.null(d())){
      "Click on a state to view event data"
    }
    else{
      clickID <- as.numeric(as.character(cnID[which(cnID$curveNumber==d()$curveNumber),]$ID))
      clickID
      bindata[which(h@cID==clickID),]
    }
  })
}

shinyApp(ui, server)

Edit 2:

Community
  • 1
  • 1
  • 1
    Wow, you figured it out. Hope that allows you to solve your more difficult questions. Could you link to them so future readers can follow your thought process and find them easily? – oshun Jan 17 '17 at 17:39

1 Answers1

1

Simplified your questions enough that I can give you a partial answer. The code below allows you to click on binned data (plotted as squares) and get the original data.

Plotly returns information on click events in the form of x, y, curveNumber and pointNumber. curveNumber indexes the trace, but this seems to vary depending on how plotly is called. pointNumber appears to index according to the order of the data (and it is also linked to curveNumber). If only one group of points is plotted, this is relatively straightforward to map to the original data.

The solution below works with points because it uses pointNumber (x and y is probably a better lookup combo because these are absolute values instead of a relative order). The solution does not work with geom_hex hexagons as you originally requested because only curveNumber is returned with a mouse-click. It looks hexagons are added first by count then by some other sorting variable. Solving the rationale behind the curveNumber numbering is the key if you want to use geom_hex.

Below are two screengrabs: Left = Original plot with geom_hex. Right = Modified plot with geom_point using pointNumber to correctly index results.

plotly curveNumber issues

Modified code is below. Both the OP and I borrow heavily from this answer about hexbins.

library(shiny); library(plotly); library(GGally); library(reshape2); library(hexbin)

ui <- fluidPage(
  plotlyOutput("plot"),
  checkboxInput("squarePoints", label = "Switch to points?"),
  verbatimTextOutput("click"),
  HTML("Check the work:"),
  plotlyOutput("plot1")
)

server <- function(input, output, session) {
  #Create data
  set.seed(1)
  bindata <- data.frame(myIndex = factor(paste0("ID",1:100)), 
                        x=rnorm(100), y=rnorm(100))

  h <- hexbin (bindata[,2:3], xbins = 5, IDs = TRUE, 
               xbnds = range(bindata$x), ybnds = range(bindata$y))

  # As we have the cell IDs, we can merge this data.frame with the proper coordinates
  hexdf <- data.frame (hcell2xy (h),  ID = h@cell, counts = h@count)

  #New code added below ###
  counts <- hexTapply(h, bindata$myIndex, table)  #list of 26
  counts <- t(simplify2array (counts))
  counts <- melt (counts)                 #2600 rows = 26 hexagons * 100 observations
  colnames (counts)  <- c ("ID", "myIndex", "present")

  allhex <- merge (counts, hexdf)         #2600 rows = 26 hexagons * 100 observations
  #rename hex coordinates
  names(allhex)[names(allhex) %in% c("x", "y")] <- c("hex.x", "hex.y")  
  allhex <- merge(allhex, bindata)
  somehex <- allhex[allhex$present > 0,]  #100 rows (original data)

  #Plotly graphs objects in a certain order, so sort the lookup data by the same order 
  #in which it's plotted.
  #No idea how curveNumber plots data. First by counts, then by ...?
  #pointNumber seems more straightforward. 
  sorthex <- hexdf[with(hexdf, order(ID)), ]

  #Create a switch to change between geom_hex() and geom_point()
  switchPoints <- reactive(if(input$squarePoints) {
    geom_point(shape = 22, size = 10)
    } else {  
      geom_hex(stat = "identity")
      })

  hexdf$myIndex <- "na" #Added here for second plotly
  ### New code added above ###

  p <- reactive(ggplot(hexdf, aes(x=x, y=y, fill = counts))  + coord_equal() +
                switchPoints() )

  output$plot <- renderPlotly({
    ggplotly(p())
  })

  d <- reactive(event_data("plotly_click"))
  #pointNumber = index starting from 0
  hexID <- reactive(sorthex[d()$pointNumber + 1, "ID"]) 

  output$click <- renderPrint({
    if (is.null(d())){
      "Click on a state to view event data"
    }
    else{
      list(
      str(d()),
      somehex[somehex$ID == hexID(),]
      )
    }
  })

  #Check your work: plot raw data over hexagons
  p.check <- ggplot(hexdf, aes(x=x, y=y, fill = counts)) + geom_hex(stat="identity") +
    geom_point(data = somehex, aes(x=x, y=y)) + coord_equal()

  output$plot1 <- renderPlotly({
    ggplotly(p.check + aes(label= myIndex) )
  })


}

shinyApp(ui, server)
Community
  • 1
  • 1
oshun
  • 2,319
  • 18
  • 32
  • Thank you for your input again, @oshun. I am definitely keeping some of your ideas in mind. I think I may have figured out a solution for this problem (posted it in edit). –  Jan 16 '17 at 19:20