4

In the following plot:

enter image description here

I would like to change count in legend to be in percent instead of number of counts. To generate the plot, I have written the following script:

  library("RColorBrewer")
  df <- read.csv("/home/adam/Desktop/data_norm.csv")
  d <- ggplot(df, aes(case1, case2)) + geom_hex(bins = 30) + theme_bw() +
    theme(text = element_text(face = "bold", size = 16)) + xlab("Case 2") + ylab("Case 1") 
  d <- d + scale_fill_gradientn(colors = brewer.pal(3,"Dark2"))

Using dput function to generate a reproducible example:

structure(list(ID = c(14L, 15L, 38L, 6L, 7L, 1L, 32L, 31L, 
17L, 30L, 19L, 24L, 5L, 5L, 7L, 8L, 35L, 4L, 1L, 6L, 45L, 58L, 
59L, 5L, 11L, 29L, 6L, 7L, 22L, 23L, 3L, 4L, 25L, 3L, 20L, 16L, 
21L, 109L, 108L, 54L, 111L, 105L, 114L, 28L, 27L, 2L, 24L, 26L, 
50L, 49L, 51L, 48L, 56L, 54L, 53L, 55L, 57L, 52L, 25L, 22L, 34L, 
23L, 19L, 38L, 39L, 18L, 13L, 27L, 11L), case1 = c(2L, 0L, 
0L, 0L, 4L, 17L, 11L, 7L, 9L, 11L, 14L, 5L, 1L, 0L, 0L, 0L, 1L, 
0L, 0L, 0L, 0L, 0L, 0L, 26L, 0L, 16L, 0L, 0L, 6L, 4L, 1L, 10L, 
3L, 13L, 13L, 12L, 6L, 0L, 0L, 11L, 0L, 0L, 0L, 0L, 3L, 16L, 
4L, 3L, 0L, 0L, 0L, 11L, 0L, 0L, 0L, 0L, 0L, 8L, 5L, 7L, 8L, 
7L, 4L, 0L, 1L, 15L, 2L, 19L, 2L), case2 = c(30L, 0L, 0L, 
0L, 30L, 30L, 29L, 29L, 29L, 29L, 29L, 29L, 30L, 30L, 30L, 30L, 
30L, 30L, 30L, 30L, 0L, 29L, 25L, 30L, 30L, 29L, 0L, 0L, 29L, 
29L, 30L, 30L, 30L, 30L, 29L, 29L, 29L, 0L, 3L, 29L, 16L, 14L, 
0L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 30L, 23L, 29L, 30L, 
30L, 30L, 30L, 30L, 30L, 30L, 30L, 29L, 0L, 30L, 29L, 30L, 29L, 
30L)), class = "data.frame", row.names = c(NA, -69L))

How can I change in the script so that I show the number of counts as percentage instead of showing the exact number of counts?

Adam Amin
  • 1,406
  • 2
  • 11
  • 23

1 Answers1

3

To change the way that the labels for a scale are shown without changing the underlying values, you can pass a reformatting function to the labels= argument of any scale_* function:

plot <- ggplot(df, aes(case1, case2)) + geom_hex(bins = 30) +
            theme_bw() +
            theme(text = element_text(face = "bold", size = 16)) +
            xlab("Case 2") +
            ylab("Case 1")

To convert from number of cases to percentage of total cases, we just divide each value by the total number of cases in df:

plot + scale_fill_gradientn(colors = brewer.pal(3,"Dark2"),
                            labels = function(x) x/nrow(df))

enter image description here

The answers to How can I change the Y-axis figures into percentages in a barplot? provide a number of ways to convert them into proper percentages, but the easiest is to use percent from the scales package (which is included with ggplot2):

plot + scale_fill_gradientn(colors = brewer.pal(3,"Dark2"),
                            labels = function(x) scales::percent(x/nrow(df)))

enter image description here

If you want to specify breaks so that the scale lists specific round percentages, note that the listed breaks will need to reference the original values, not the transformed percentages. You can do that by reversing whatever conversion you used in labels:

plot + scale_fill_gradientn(colors = brewer.pal(3,"Dark2"),
                            labels = function(x) scales::percent(x/nrow(df)),
                            breaks = c(.05, .1, .15, .2, .25, .3) * nrow(df))

enter image description here

divibisan
  • 11,659
  • 11
  • 40
  • 58