-1

How can I get R to output consistent start, end and intervals for x-axis of bar-plots? I have two sets of data (matrices) that I would like to start at 220, end at 360 and have the same tick mark intervals. The purpose of the plots, is to easily compare the two data-sets. The attached example shows two plots with a bar at 274, however, each plot's scale is a little different, therefore, the bars don't lineup.

Here is the R code I'm using: enter image description here

barplot(as.matrix(new_data), ylim = c(0,topmax), ylab = "Reads", 
        col = rainbow(30), cex.lab=1.5, cex.axis=1.5, cex.sub=1.5, col.lab = "blue")
axis(1, seq(220,360))

Here is an example of data matrix I'm using. Assume the matrix is same for both data sets.

clonename = c("IGH_V4", "IGH_V2", "IGH_V8", "IGH_V7")
readlength = c(456, 654, 457, 345)

P <- matrix(c(0,55,0,65,0,0,4,100,0,0,67,6,0,56,0,0), nrow = 4, byrow = TRUE, dimnames = list(clonename, readlength))
print(P)

Thank you for your help in advance

user3781528
  • 623
  • 6
  • 27
  • 3
    Please attach a sample of your data (you can use `dput`) to make this example [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – bouncyball Sep 06 '19 at 14:03

1 Answers1

3

I'm sure there's a way to do this with base R graphics, which you used to make your plots. However, I find that the package ggplot2 makes this sort of thing much easier. In the example below, I've used the data you provided as a matrix, then

library(dplyr) # for mutate and tibble
library(reshape2) # for melt function
library(ggplot2) # for plotting

# Data as you provided
rownames = c("IGH_V4", "IGH_V2", "IGH_V8", "IGH_V7")
colnames = c(456, 654, 457, 345)

reads.m <- matrix(c(0,55,0,65,0,0,4,100,0,0,67,6,0,56,0,0), 
              nrow = 4, byrow = TRUE, 
              dimnames = list(rownames, colnames))

# Melt the matrix into the shape of a dataframe
reads.m <- melt(reads.m)

# Convert the matrix to a data frame (tibble is a type of data frame with
# some nice extra features
reads.df <- as_tibble(reads.m)

# Name the columns
names(reads.df) <- c("Sequence", "Clone", "Reads")

# Create a data frame called diluted with an extra column indicating its
# concentration
diluted.df <- mutate(reads.df, Concentration = "Diluted")

# Do the same for undiluted aka straight then add a clone to change the
# range of clones. Because clones are on the x-axis of your plot, having
# different ranges of data is what presents the problem you're trying to 
# solve by aligning the plot axes.
straight.df <- mutate(reads.df, 
                      Concentration = "Straight",
                      Reads = round(Reads*1.2))
straight.df <- bind_rows(straight.df, tibble(Sequence = NA, 
                                             Clone = 700, 
                                             Reads = 8, 
                                             Concentration = "Straight"))

# Concatenate the tables for the two dilutions.
reads.df <- bind_rows(diluted.df, straight.df)

# Sanity check
print(reads.df, n = nrow(reads.df))

# # A tibble: 33 x 4
# Sequence Clone Reads Concentration
# <fct>    <dbl> <dbl> <chr>        
#   1 IGH_V4     456     0 Diluted      
# 2 IGH_V2     456     0 Diluted      
# 3 IGH_V8     456     0 Diluted      
# 4 IGH_V7     456     0 Diluted      
# 5 IGH_V4     654    55 Diluted      
# 6 IGH_V2     654     0 Diluted      
# 7 IGH_V8     654     0 Diluted      
# 8 IGH_V7     654    56 Diluted      
# 9 IGH_V4     457     0 Diluted      
# 10 IGH_V2     457     4 Diluted      
# 11 IGH_V8     457    67 Diluted      
# 12 IGH_V7     457     0 Diluted      
# 13 IGH_V4     345    65 Diluted      
# 14 IGH_V2     345   100 Diluted      
# 15 IGH_V8     345     6 Diluted      
# 16 IGH_V7     345     0 Diluted      
# 17 IGH_V4     456     0 Straight     
# 18 IGH_V2     456     0 Straight     
# 19 IGH_V8     456     0 Straight     
# 20 IGH_V7     456     0 Straight     
# 21 IGH_V4     654    66 Straight     
# 22 IGH_V2     654     0 Straight     
# 23 IGH_V8     654     0 Straight     
# 24 IGH_V7     654    67 Straight     
# 25 IGH_V4     457     0 Straight     
# 26 IGH_V2     457     5 Straight     
# 27 IGH_V8     457    80 Straight     
# 28 IGH_V7     457     0 Straight     
# 29 IGH_V4     345    78 Straight     
# 30 IGH_V2     345   120 Straight     
# 31 IGH_V8     345     7 Straight     
# 32 IGH_V7     345     0 Straight     
# 33 NA         700     8 Straight     

ggplot(reads.df, aes(x = Clone, y = Reads, fill = Concentration)) +
  geom_col(width = 4) +
  facet_grid(rows = vars(Concentration)) +
  theme_bw() + # White background, rather than default grey
  ylab("Reads")

enter image description here

ggplot(reads.df, aes(x = Clone, y = Reads, fill = Concentration)) +
  geom_col(width = 5) +
  facet_wrap(facets = vars(Concentration)) +
  theme_bw() +  # White background, rather than default grey
  ylab("Reads")

enter image description here

Note: ggplot is returning a warning that looks like this:

# Warning messages:
# 1: position_stack requires non-overlapping x intervals 
# 2: position_stack requires non-overlapping x intervals 

This is the result of manually setting the width of geom_col. Without setting that width manually, the width of the columns is so narrow that that they are hard to read. Making them wider is not causing them to overlap, so we can ignore the warning.

Gregory
  • 4,147
  • 7
  • 33
  • 44
  • thanks, you also need tidyverse library for tibble. How do I add two matrices to ggplot data frame, using your syntax. I've tried;` df <- tibble(Replicates = factor(c(rep("Straight", 116), rep("Diluted"`, 108))),data=melt(matrixA),data=melt(matrixB) thanks – user3781528 Sep 06 '19 at 17:11
  • If you'll add to your question a sample of what your data look like then we can help you figure out how to wrangle it into the shape you need. Could you add some code to produce a matrixA and matrixB? – Gregory Sep 06 '19 at 17:45
  • Okay, based on the matrix you provided I now understand that the data you're starting with has counts of reads for each clone rather than records for individual reads with an identifier for which clone produced it. That difference requires not just wrangling the matrices into a data frame, but also slightly different coding for the plot. This isn't a big deal, but in the future it would save time if you would include sample data with your initial question. – Gregory Sep 06 '19 at 19:15
  • Sorry! Could you please update your answer with the modified code. Thanks – user3781528 Sep 06 '19 at 19:21
  • Do I understand correctly that you want to discard the information about which sequence produced the reads, lumping sequences together by clone? – Gregory Sep 06 '19 at 19:27
  • Yes, we only care about the amount of clone present. Clone names should be excluded. There are usually hundreds of clones and we only want to see if one or two stand out. – user3781528 Sep 06 '19 at 19:31
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/199091/discussion-between-gregory-and-user3781528). – Gregory Sep 06 '19 at 19:37