4

I have a data frame with 10k rows and 3 columns: xpos, ypos and cluster (cluster is a number from 0 to 9) here: http://pastebin.com/NyQw29tb

I would like to show a hex plot with each hexagon colored according to the most-frequent cluster within that hexagon.

So far I've got:

 library(ggplot2)
 library(hexbin)
 ggplot(clusters, aes(x=xpos, y=ypos, z=cluster)) + stat_summary_hex(fun.x=mode)

Which I think is giving me what I want (i.e. is filling in every hexagon with a color from 0 to 9), but the color scale appears continuous, and I can't figure out how to make it use a discrete one.

output

For extra context, here's the underlying, messier view of the data, which I'm trying to smooth out by using hexagons:

 qplot(data=clusters, xpos, ypos, color=factor(cluster))

output2

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
nicolaskruchten
  • 26,384
  • 8
  • 83
  • 101

2 Answers2

4

I don't knw what your stat_summary_hex(fun.x=mode) is doing, but I'm pretty sure it's not what you think (mode gives the storage mode of an object, not the statistical mode, and fun.x doesn't match any formal argument of stat_summary_hex). Try this. It tabulates the observations in each bin, and pulls out the label of the maximum count.

ggplot(clusters, aes(x=xpos, y=ypos, z=cluster)) + stat_summary_hex(fun = function(x) {
    tab <- table(x)
    names(tab)[which.max(tab)]
})

Hexbinned clusters

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
1

I believe there are two problems here. First, mode is not the function you want (check the help--it's to "Get or set the type or storage mode of an object"). Second, the parameter if fun= rather than fun.x= for stat_summary_hex.

There's a nice discussion of mode functions here. The recommended function is:

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

Finally, you want to make sure that the fill for the hexagons is treated as a discrete value. You can modify the fun function so that the return value is a character (as in the code below).

Here is a reproducible example:

library(ggplot2)
library(hexbin)
Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}
clusters=data.frame(xpos=rnorm(1000),ypos=rnorm(1000),cluster=rep(1:9,length.out=100))
ggplot(clusters, aes(x=xpos, y=ypos, z=cluster)) +
  stat_summary_hex(fun=function(x){as.character(Mode(x))})

I hope this helps.

Community
  • 1
  • 1
jflournoy
  • 763
  • 1
  • 8
  • 23