3

I have following data. Each observation is a genomic coordinate with copy number changes (copy.number.type) which is found in some percentage of samples (per.found).

chr<-c('1','12','2','12','12','4','2','X','12','12','16','16','16','5'
              ,'4','16','X','16','16','4','1','5','2','4','5','X','X','X','4',
              '1','16','16','1','4','4','12','2','X','1','16','16','2','1','12',
              '2','2','4','4','2','1','5','X','4','2','12','16','2','X','4','5',
              '4','X','5','5')

start <- c(247123880,91884413,88886155,9403011,40503634,10667741,88914884,
                      100632615,25804205,25803542,18925987,21501823,21501855,115902990,
                      26120955,22008406,432498,22008406,22008406,69306802,4144380,73083197,
                      47743372,34836043,16525257,315832,1558229,51048657,49635818,239952709,
                      69727769,27941625,80328938,49136485,49136654,96076105,133702693,315823,
                      16725215,69728318,88520557,89832606,202205081,124379013,16045662,89836880,
                      49657307,97117994,76547133,35051701,344973,1770075,49139874,77426085,
                      9406416,69727781,108238962,151006944,49121333,6669602,89419843,74214551,
                      91203955,115395615)

type <- c('Inversions','Deletions','Deletions','Deletions','Deletions','Duplications','Deletions','Deletions',
          'Duplications','Deletions','Duplications','Inversions','Inversions','Deletions','Duplications',
          'Deletions','Deletions','Deletions','Deletions','Inversions','Duplications','Inversions','Inversions',
          'Inversions','Deletions','Deletions','Deletions','Insertions','Deletions','Inversions','Inversions',
          'Inversions','Inversions','Deletions','Deletions','Inversions','Deletions','Deletions','Inversions',
          'Inversions','Deletions','Deletions','Deletions','Insertions','Inversions','Deletions','Deletions',
          'Deletions','Inversions','Deletions','Duplications','Inversions','Deletions','Deletions','Deletions',
          'Inversions','Deletions','Inversions','Deletions','Inversions','Inversions','Inversions','Deletions','Deletions')


per.found <- c(-0.040,0.080,0.080,0.040,0.080,0.040,0.080,0.040,0.040,0.120,0.040,-0.080,-0.080,0.040,0.040,0.120,
               0.040,0.120,0.120,-0.040,0.011,-0.011,-0.023,-0.023,0.011,0.023,0.011,0.011,0.011,-0.011,-0.034,
               -0.011,-0.023,0.011,0.011,-0.011,0.023,0.023,-0.023,-0.034,0.011,0.023,0.011,0.011,-0.023,0.023,
               0.011,0.011,-0.011,0.011,0.011,-0.023,0.011,0.057,0.011,-0.034,0.023,-0.011,0.011,-0.011,-0.023,
               -0.023,0.011,0.011)

df <- data.frame(chromosome = chr, start.coordinate = start, copy.number.type = type, per.found = per.found )

I would like to create a line plot. I created a plot using ggplot (facets), but the problem is I can not connect the points between two facets. Is there any way to do that. I do not necessarily need to use facets if there is a way to annotate x axis scales by chromosome. In the following image the dotted line shows what I would like to have for all copy.number.type lines.

EDIT: Looking for simplified approach.

library(ggplot2)
ggplot(df, aes(x=start.coordinate,y=per.found, group=copy.number.type, color=copy.number.type))+
  geom_line()+
  geom_point()+
  facet_grid(.~chromosome,scales = "free_x", space = "free_x")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Desired output: As shown by the red dashed lines. I want to connect all the border points with a dashed line across facets. enter image description here

Andre Elrico
  • 10,956
  • 6
  • 50
  • 69
bishwo
  • 53
  • 1
  • 8
  • 1
    [THIS](https://stackoverflow.com/questions/31690007/ggplot-drawing-line-between-points-across-facets) does not help? – Andre Elrico Jun 27 '18 at 07:22
  • Oh! I totally missed that thread while searching. I tried it, but `moveToGrob` function is not found. – bishwo Jun 27 '18 at 07:35
  • 1
    In my opinion the example is way to loaded/big/complicated to construct a dynamic and scalable solution. Try to add an example with maybe 2 types and 3 facets. – Andre Elrico Jun 27 '18 at 07:35
  • 2
    I believe moveToGrob is from library grid – Andre Elrico Jun 27 '18 at 07:37
  • Importing grid worked. It looks like a massive line of code as I have 25 facets in actual data. – bishwo Jun 27 '18 at 07:39
  • 1
    you should add a simplified example under you question, that has the essence of your problem and ask for a "general solution". I believe this will get more ppl into trying stuff out. – Andre Elrico Jun 27 '18 at 07:45
  • 1
    Can you explain what's the logic to connect chromosomes? – pogibas Jun 27 '18 at 07:48
  • Yes, chromosomes are not connected and typically arbitrarily ordered by their size. Does it make biological sense to connect these lines at all? – Axeman Jun 27 '18 at 07:55
  • I would like to connect dots of same group (same type of CNVs). – bishwo Jun 27 '18 at 08:30
  • Yes we get that. But there is not biological continuity between the end of chromosome 1 and the start of chromosome 2... – Axeman Jun 27 '18 at 09:09
  • I agree you, but I have seen similar plots in some research articles. That is the reason why I also would like to summarize my results in the same way. – bishwo Jun 27 '18 at 10:40
  • If you agree I would advise to go with your own opinion, instead of copying what others are doing! – Axeman Jun 27 '18 at 11:24

1 Answers1

1

Note: it may not make sense to connect the lines between the chromosomes.

But here is one way, by avoiding facets:

library(dplyr)
df2 <- df %>% 
  mutate(chromosome = factor(chromosome, c(1, 2, 4, 5, 12, 16, 'X'))) %>% 
  arrange(chromosome, start.coordinate)
chromosome_positions <- df2 %>% 
  group_by(chromosome) %>% 
  summarise(start = first(start.coordinate), end = last(start.coordinate)) %>% 
  mutate(
    size = end - start,
    new_start = cumsum(lag(size, default = 0)),
    new_end = new_start + size
  )
df3 <- df2 %>% 
  left_join(chromosome_positions, 'chromosome') %>% 
  mutate(new_x = start.coordinate + (new_start - start))

ggplot(df3, aes(x=new_x,y=per.found, group=copy.number.type, color=copy.number.type))+
  geom_rect(
    aes(xmin = new_start, xmax = new_end, ymin = -Inf, ymax = Inf, fill = chromosome), 
    chromosome_positions, inherit.aes = FALSE, alpha = 0.3
  ) +
  geom_line() +
  geom_point() +
  geom_text(
    aes(x = new_start + 0.5 * size, y = Inf, label = chromosome),
    chromosome_positions, inherit.aes = FALSE, vjust = 1
  ) + 
  scale_fill_manual(values = rep(c('grey60', 'grey90'), 10), guide = 'none') +
  scale_x_continuous(expand = c(0, 0))

enter image description here

Axeman
  • 32,068
  • 8
  • 81
  • 94