3

I have a dataframe that looks like this.

> head(df)
  DGene JGene cdr3_len Sum
1 IGHD1 IGHJ1        0  22
2 IGHD1 IGHJ1        1  11
3 IGHD1 IGHJ1        2  16
4 IGHD1 IGHJ1        3  40
5 IGHD1 IGHJ1        4  18
6 IGHD1 IGHJ1        5  30
...

It is pretty simple to facet_grid.

ggplot(df,aes(x=cdr3_len,y=Sum)) + geom_line() + xlim(c(1,42)) + facet_grid(JGene~DGene,scales="free_y")

and getting something that looks like.

enter image description here

I was wondering if anyone could help me with adding a hline to the mean of each grid. Or possibly how to print the mean of each grid in the top right corner.

Thanks,

Edit - Full link to dataframe

zx8754
  • 52,746
  • 12
  • 114
  • 209
jwillis0720
  • 4,329
  • 8
  • 41
  • 74

1 Answers1

7

Here's a way to add both text and a vertical line for the mean of cdr3_len by pre-computing the desired values (per @jwillis0720's comment):

First, calculate the mean of cdr3_len for each panel and then left_join that data frame to a second data frame that calculates the appropriate y-value for placing the text on each panel (because the appropriate y-value varies only by level of JGene).

library(dplyr) 

meanData = df %>% group_by(JGene, DGene) %>%
  summarise(meanCDR = sum(Sum*cdr3_len)/sum(Sum)) %>%
  left_join(df %>% group_by(JGene) %>%
              summarise(ypos = 0.9*max(Sum)))

Now for the plot:

ggplot(df,aes(x=cdr3_len, y=Sum)) +
  geom_vline(data=meanData, aes(xintercept=meanCDR), colour="red", lty=3) +
  geom_line() +
  geom_text(data=meanData, 
            aes(label=round(meanCDR,1), x=40, y=ypos), colour="red",
            hjust=1) +
  xlim(c(1,42)) + 
  facet_grid(JGene~DGene,scales="free_y")

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • So close! I do need cdr3_len, but the Sum are counts of each length. So I need the meanSum to be the mean length. The function. cdr3_len*Sum/sum(Sum), but I can't quite figure out that function. – jwillis0720 Oct 25 '15 at 23:41
  • Something like `meanData = df %>% group_by(JGene, DGene) %>% summarise(meanSum = Sum*cdr3_len/sum(Sum)) %>% left_join(df %>% group_by(JGene) %>% summarise(ypos = 0.9*max((Sum*cdr3_len)/sum(Sum))))` – jwillis0720 Oct 25 '15 at 23:42
  • 1
    I had a feeling that's what you wanted, but you said `hline` in your question, so I went with that. See updated code. You don't need to change the calculation of `ypos`, because that's calculated based on the maximum value of `Sum` for each level of `JGene`. – eipi10 Oct 25 '15 at 23:48