11

I am creating a scatter plot matrix using GGally::ggpairs. I am using a custom function (below called my_fn) to create the bottom-left non-diagonal subplots. In the process of calling that custom function, there is information about each of these subplots that is calculated, and that I would like to store for later.

In the example below, each h@cID is a int[] structure with 100 values. In total, it is created 10 times in my_fn (once for each of the 10 bottom-left non-diagonal subplots). I am trying to store all 10 of these h@cID structures into the listCID list object.

I have not had success with this approach, and I have tried a few other variants (such as trying to put listCID as an input parameter to my_fn, or trying to return it in the end).

Is it possible for me to store the ten h@cID structures efficiently through my_fn to be used later? I feel there are several syntax issues that I am not entirely familiar with that may explain why I am stuck, and likewise I would be happy to change the title of this question if I am not using appropriate terminology. Thank you!

library(hexbin)
library(GGally)
library(ggplot2)

set.seed(1)

bindata <- data.frame(
    ID = paste0("ID", 1:100), 
    A = rnorm(100), B = rnorm(100), C = rnorm(100), 
    D = rnorm(100), E = rnorm(100))
    bindata$ID <- as.character(bindata$ID
)

maxVal <- max(abs(bindata[ ,2:6]))
maxRange <- c(-1 * maxVal, maxVal)

listCID <- c()

my_fn <- function(data, mapping, ...){
  x <- data[ ,c(as.character(mapping$x))]
  y <- data[ ,c(as.character(mapping$y))]
  h <- hexbin(x=x, y=y, xbins=5, shape=1, IDs=TRUE, 
              xbnds=maxRange, ybnds=maxRange)
  hexdf <- data.frame(hcell2xy(h),  hexID=h@cell, counts=h@count)
  listCID <- c(listCID, h@cID)
  print(listCID)
  p <- ggplot(hexdf, aes(x=x, y=y, fill=counts, hexID=hexID)) + 
            geom_hex(stat="identity")
  p
}

p <- ggpairs(bindata[ ,2:6], lower=list(continuous=my_fn))
p

enter image description here

David C.
  • 1,974
  • 2
  • 19
  • 29
  • 2
    you could add the extra info as an attribute. So after the line `hexdf <- data.frame(...)` use `attr(hexdf, "cID") <- h@cID` (and remove the two listCID rows of code). You can then access by looking at the individual plots ie `str(p[2,1])` and extract with `attr(p[2,1]$data, "cID")` – user20650 Jan 23 '17 at 18:41

2 Answers2

5

If I understand your problem correctly this is quite easily, albeit inelegantly, achieved using the <<- operator.

With it you may assign something like a global variable from inside the scope of your function.

Set listCID <- NULL before executing the function and listCID <<-c(listCID,h@cID) inside the function.

listCID = NULL

my_fn <- function(data, mapping, ...){
  x = data[,c(as.character(mapping$x))]
  y = data[,c(as.character(mapping$y))]
  h <- hexbin(x=x, y=y, xbins=5, shape=1, IDs=TRUE, xbnds=maxRange, ybnds=maxRange)
  hexdf <- data.frame (hcell2xy (h),  hexID = h@cell, counts = h@count)

  if(exists("listCID")) listCID <<-c(listCID,h@cID)

  print(listCID)
  p <- ggplot(hexdf, aes(x=x, y=y, fill = counts, hexID=hexID)) + geom_hex(stat="identity")
  p
    }

For more on scoping please refer to Hadleys excellent Advanced R: http://adv-r.had.co.nz/Environments.html

Pewi
  • 1,134
  • 13
  • 14
0

In general it is not a good practice to try to return two different results with one function. In your case, you want to return the plot and the result of a calculation (the hexbin cIDs).

Better would be to calculate your results in steps. Each step would be a separate function. The result of the first function (calculating the hexbins) can then be used as an input for multiple follow-up functions (finding the cIDs and creating the plot). Next follows one of the many ways in which you could refactor your code:

  • calc_hexbins() in which you generate all the hexbins. This function could return a named list of hexbins (e.g. list(AB = h1, AC = h2, BC = 43)). This is achieved by enumerating all the possible combinations of your list (A, B, C, D and E). The drawback is that you are duplicating some of the logic that is already in ggpairs().
  • gen_cids() takes the hexbins as an input and generates all the cIDs. This is a simple operation where you loop (or lappy) over all the elements in your list and take the cID.
  • create_plot() also takes the hexbins as an input and this is the function in which you actually generate the plot. Here you can add an extra parameter for the list of hexbins (there is a function wrap() in your package GGally to do this). Instead of calculating the hexbins, you can look them up in the named list that you've generated earlier by combining the A and the B in a string.

This avoids hacky methods such as working with attributes or using global variables. These work of course, but are often a headache when maintaining code. Unfortunately, this will also make your code a little longer, but this can be a good thing.

takje
  • 2,630
  • 28
  • 47