1

I'm generating scatter plot of the comparison of 2 variables in a data frame, where the colors of the points were determined by their type (another column in the data frame). However, there are many points that overlap each other and therefore I want an estimation of how many points there are in a given part of the grid. (for example: In case of hexbin package, one can estimate how many values there are in a given hexagon on the grid)

My question is: how can I use the hexbin package where for every group I will have another color scale, so in that way it will be possible to distinguish between the groups and get an evaluation of how many values there are.

I tried to google it but didn't find any satisfying solution. All the options that I found were focused only on the distinction between the groups.

My code so far:

ggplot(data_for_scatter_plot,aes(x=Log2FoldChange, y=TT_frequency, color=factor(Type))) +
geom_point(alpha = 0.6)

where my data_for_scatter_plot data frame is:

 Gene             Log2FoldChange  length  TT_frequency  Type
 ENSG00000007968  1.928153        24791   0.05623008    up_regulated
 ENSG00000009724  2.209263        20711   0.05842306    down_regulated
 ENSG00000010219  1.794972        53099   0.08250626    other_genes
 ENSG00000053438  3.815411        2479    0.10851150    up_regulated

And the graph that I get is:

scatterplot

While I want to get the following graph for each group in a different color scale:

hexbin plot

camille
  • 16,432
  • 18
  • 38
  • 60
Elizabeth
  • 282
  • 1
  • 6
  • 16
  • Short answer is no, you can only use one scale per aesthetic (such as fill). Longer answer is folks have done some workarounds before, often by using opacity to mimic lighten gradients. Here are a couple: https://stackoverflow.com/q/46333719/5325862, https://stackoverflow.com/q/55167464/5325862 – camille Mar 24 '19 at 15:38
  • Here's another one with several ideas: https://stackoverflow.com/q/50163072/5325862 – camille Mar 24 '19 at 16:03

1 Answers1

1

While this is not exactly what you asked for it might be a push in the right direction.

I don't think multiple color/cont scales are possible at the same time. You can change the color of the lines around the hexes, and increase line thickness.

First some sample data (please provide this yourself in the future)

library(tidyverse)
library(hexbin)

set.seed(1)

data_for_scatter_plot <- 
  crossing(
  tibble(Trial =seq(1:100)),
  tibble(Extra = seq(1:10)),
  tribble( 
        ~Gene,          ~Log2FoldChange,  ~length,  ~TT_frequency,  ~Type, 
        "ENSG00000007968",  1.928153,        24791,   0.05623008,    "up_regulated",
        "ENSG00000009724",  2.209263,        20711,   0.05842306,    "down_regulated",
        "ENSG00000010219",  1.794972,        53099,   0.08250626,    "other_genes",
        "ENSG00000053438",  3.815411,        2479 ,   0.10851150,    "up_regulate")) %>% 
    mutate(
      Log2FoldChange = Log2FoldChange*0.001*Trial+rnorm(n=n(), mean=0, sd = 0.1),
      TT_frequency = TT_frequency-0.00001*Trial+rnorm(n=n(), mean=0, sd = 0.005)
      )


  data_for_scatter_plot  

    # A tibble: 4,000 x 7
   Trial Extra Gene            Log2FoldChange length TT_frequency Type          
   <int> <int> <chr>                    <dbl>  <dbl>        <dbl> <chr>         
 1     1     1 ENSG00000007968        -0.0607  24791       0.0505 up_regulated  
 2     1     1 ENSG00000009724         0.0206  20711       0.0622 down_regulated
 3     1     1 ENSG00000010219        -0.0818  53099       0.0853 other_genes   
 4     1     1 ENSG00000053438         0.163    2479       0.102  up_regulate   
 5     1     2 ENSG00000007968         0.0349  24791       0.0461 up_regulated  
 6     1     2 ENSG00000009724        -0.0798  20711       0.0614 down_regulated
 7     1     2 ENSG00000010219         0.0505  53099       0.0754 other_genes   
 8     1     2 ENSG00000053438         0.0776   2479       0.117  up_regulate   
 9     1     3 ENSG00000007968         0.0595  24791       0.0654 up_regulated  
10     1     3 ENSG00000009724        -0.0283  20711       0.0653 down_regulated
# ... with 3,990 more rows

Now the plot:

data_for_scatter_plot %>% 
  ggplot(aes(x=Log2FoldChange, y=TT_frequency, color=factor(Type)))+
  geom_hex(size = 1)

enter image description here

I hope this helps.

---EDIT after comment---
I think a better solution for you is to facet. This will also make you able to distinguish overlapping groups

data_for_scatter_plot %>% 
  ggplot(aes(x=Log2FoldChange, y=TT_frequency, color=factor(Type)))+
  geom_hex()+
  facet_grid(cols = vars(factor(Type)))

enter image description here

Steen Harsted
  • 1,802
  • 2
  • 21
  • 34