3

I'm trying to plot a copy number variation profile plot in R. This is what I'm looking for but with all the cells in my data.

I'm looking to make something like this but with all the cells

Ploidy is on the Y axis and chromosome number is on the X axis

This is my data and this is what I've tried so far but it's not giving me what I'm looking for

input <- data.frame(chrom = sample("chr1"),start = sample(c(780000, 2920000, 4920000)), stop=sample(c(2920000, 4920000, 692000)), cell0=sample(1), cell1=sample(1,3,1),cell2=sample(2,1,2)
ggplot(input, aes(x=chrom, y=cell_0, group=1)) +
  geom_point() +
  geom_line(color = "#00AFBB", size = 1) 

Here's a link to the whole file

https://pastebin.com/440AX3Dr

When I run the code in the answer this is what I get. I'm hoping that all the chromosomes can be horizontal like the above plot.

Link to output

M--
  • 25,431
  • 8
  • 61
  • 93
bioadg
  • 53
  • 6
  • 1
    Welcome to Stack Overflow! You should provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Do not copy/paste your data or insert it as a picture. – M-- Dec 09 '19 at 22:45

1 Answers1

3

We can use facet_wrap to put each chrom side by side. I used bunch of formatting variables to make the plot look like what you showed above. I also made my own data with two chrom for better illustration. Look below;

read.table(text="chrom start stop cell_0 cell_1 cell_2
chr1 780000 2920000 2 2 2
chr1 2920000 4920000 1 2 3
chr1 4920000 6920000 2 3 2
chr2 480000 1920000 1 2 3
chr2 1920000 2920000 2 2 2
chr2 2920000 3920000 1 3 3", header=T) -> input
library(ggplot2)
library(tidyr)

input %>% 
  pivot_longer(c(start,stop)) %>% 
    ggplot(., aes(x=value, y=as.factor(cell_0), group=1L)) +
      geom_point(colour="grey") +
      facet_wrap(~chrom, strip.position = "bottom", scales = "free_x") +
      geom_line(color = "#00AFBB", size = 1) +
      theme_bw() +
      theme(panel.spacing.x=unit(0, "lines"),
            panel.spacing.y=unit(0, "lines"),
            axis.title.x=element_blank(),
            axis.text.x=element_blank(),
            axis.ticks.x=element_blank(),
            strip.background = element_rect(color="black", fill="white")) +
      scale_x_continuous(expand = c(.01, 0)) +
      scale_y_discrete("ploidy", expand = c(.3,.3)) +
      ggtitle("cell_596, 2Mb resoloution, mean ploidy 3.04")

Updated solution for the whole data

I added another column to show how this can work for two cell columns. However, this will be very crowded plot.

# input <- read.table(file = "clipboard", header=T)
## read data from pastebin

library(ggplot2)
library(tidyr)
library(dplyr)

set.seed(123)

input %>% 
  mutate(cell_1 = cell_0  + 
         sample.int(1, 1417, replace = T) * sample(c(-1,1),1417, replace = T)) %>% 
  pivot_longer(c(start,stop), names_to = "step", values_to = "time") %>% 
  pivot_longer(c(cell_0,cell_1), names_to = "cell", values_to = "ploidy") %>% 
  ggplot(data=., aes(x=time, y=as.factor(ploidy), group=cell)) +
  geom_point(aes(colour=cell)) +
  facet_wrap(~chrom, strip.position = "bottom", scales = "free_x", nrow=1) +
  geom_line(aes(color = cell), size = 1, alpha=0.5) +
  theme_bw() +
  scale_x_continuous(expand = c(.01, 0)) +
  scale_y_discrete("ploidy", expand = c(.1,.1)) +
  theme(panel.spacing.x=unit(0, "lines"),
        panel.spacing.y=unit(0, "lines"),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        strip.background = element_rect(color="black", fill="white"),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line(colour = "black"),
        plot.title = element_text(hjust = 0.5)) +
  ggtitle("cell_596, 2Mb resoloution, mean ploidy 3.04")

Final Update:

library(ggplot2)
library(tidyr)
library(dplyr)
library(stringr)

input %>% 
  pivot_longer(c(start,stop), names_to = "step", values_to = "time") %>% 
  mutate(chrom = factor(chrom, levels = str_sort(unique(chrom), numeric = T))) %>% 
  ggplot(data=., aes(x=time, y=as.factor(cell_0), group=1L)) +
  geom_point(colour="grey", size=0.5) +
  geom_line(color = "#00AFBB", size = 1, alpha=0.5) +
  facet_wrap(~as.factor(chrom), 
             strip.position = "bottom", scales = "free_x", nrow=1) +
  theme_bw() +
  scale_x_continuous(expand = c(.01, 0)) +
  scale_y_discrete("ploidy", expand = c(.1,.1)) +
  theme(panel.spacing.x=unit(0, "lines"),panel.spacing.y=unit(0, "lines"),
        axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        strip.background = element_rect(color="black", fill="white"),
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line(colour = "black"),
        plot.title = element_text(hjust = 0.5)) +
  ggtitle("cell_596, 2Mb resoloution, mean ploidy 3.04")

Created on 2019-12-10 by the reprex package (v0.3.0)

Community
  • 1
  • 1
M--
  • 25,431
  • 8
  • 61
  • 93
  • Thank you! Is there a way to plot cell_0, cell_1, and cell_2 all in the same plot? – bioadg Dec 10 '19 at 00:06
  • When I run that code on my entire file it stacks up the different chromosomes (I have 24) on top of each other. How can I make sure they all end up in a horizontal row? – bioadg Dec 10 '19 at 03:19
  • Thanks! How would I change the code you just added to make it go make to just cell_0? – bioadg Dec 10 '19 at 21:19