Before asking, I have read this post, but mine is more specific.
library(ggplot2)
library(scales)
set.seed(1)
dat <- data.frame(x = rnorm(1000), y = rnorm(1000))
I replace my real data with dat
, the domain of x and y is [-4,4] at this random seed, and I partition the area into 256(16*16) cells, the interval of which is 0.5. For each cell, I want to get the count numbers.
Yeah, it's quite easy, geom_bin2d
can solve it.
# plot
p <- ggplot(dat, aes(x = x, y = y)) + geom_bin2d()
# Get data - this includes counts and x,y coordinates
newdat <- ggplot_build(p)$data[[1]]
# add in text labels
p + geom_text(data=newdat, aes((xmin + xmax)/2, (ymin + ymax)/2,
label=count), col="white")
So far so good, but I only want to get top 100 count numbers and plot in the pic, like pic below.
After reading ?geom_bin2d
, drop = TRUE
only removes all cells with 0 counts, and my concern is the top 100 counts. What should I do, this is question 1.
Please take another look on the legend
of the 2nd pic, the count number is small and close, what if it's 10,000, 20,000, 30,000.
The method is use trans
in scale_fill_gradient
, the built_in function are exp, log, sqrt, and so on, but I want to divide 1,000. Then, I found trans_new()
in package scales
and had a try, but negative.
sci_trans <- function(){ trans_new('sci', function(x) x/1000, function(x) x*1000)}
p + scale_fill_gradient(trans='sci')
And, this is question 2. I have googled a lot, but cannot find a way to solve it, thanks a lot for anyone who does me a favor, thank you!