0

I have been trying for months to figure out how to do this, so hopefully somebody can give me some clarity. I have created an R script that displays all of the values in my database's Genes table. So it gives the length (in nucleotides) of each gene, and I lay it out horizontally.

The main idea was to take values from another table called QGRS, which contains the lengths of each QGRS. The issue I am having is that there are many QGRS's on a single gene, so I can't figure out how to use R to show this. There may be a better way, but my idea was to have the horizontal gene lengths bars be one color, and have the QGRS lengths appear right over those bars as a different color to highlight the QGRS's location on the gene. And this is for all of the genes. I don't understand how to get multiple values over a single bar, and then how to superimpose the two graphs properly.

I hope this makes sense. Here is what I have:

        ############## Setup for Chromosome Lengths ############## 
quartz()    #for mac
genes<-read.delim("genes_plot.out", header=FALSE)
sortmat<-genes[order(genes$V2),]

#qgrs<-read.delim("qgrs_plot.out", header=FALSE)

    ############## Generate Horizontal Genes Plot ############## 
par(bg = "darkseagreen1")
png("genes_plot.png", height=1500, width=1000)  
barplot( sortmat[,3], names.arg=sortmat[,1], col="red",      #genes
    horiz=TRUE,
    las='1',
    main="Human QGRS by Genes",
    ylab="",
    space=30,
    xlab="Length of Gene [# of nucleotides]",   )
mtext("Gene ID [RefSeq]", side=0, line=2)

    ############## Include QGRS over Genes Plot ############## 
#par(new=TRUE)  #to superimpose the two barplots
#barplot( qgrs[,1] )    #qgrs

        ############## End ############## 
dev.off()

And here is what it outputs [long picture!]: enter image description here

** Note, the numbers on the left are cut off a bit, I have no idea why... but they are the gene IDs straight from NCBI, just a reference to label them as.


Let me know if more information is needed. Please, any help I would greatly appreciate. I really tried to search for the answers for months (this entire past semester), but I don't think I'm very competent at this. It's too complex for me.

Now I know that I could make another graph for the QGRS but if it was this same way, they would each come out on different lines! So that's not helpful.

Also, my Genes table works like this. I have 5 genes per chromosome, for all the chromosomes in the human genome (24 if you count the X and Y separately). So if needed, the genes graph too could be combined to have only 24 lines and where each line consists of the 5 genes, but I doubt this helps.

--------EDIT------------

Here is sample data from Genes table, the 5 genes for chromosomes 1 and two:

GeneID   Chromosome   Length [nucleotides]
8682    1   10037
10623   1   18364
441878  1   858
645382  1   2986
650901  1   917
841         2   54268
28949   2   38
80303   2   76724
729992  2   1072
100873329   2   106

And here is sample data from QGRS table [just a few lines for Gene '8682' [first line in above sample data]

   Start    End    Chromosome
    389     391     1
    524     526     1
    739     741     1
    834     836     1
    904     906     1
byobob
  • 99
  • 1
  • 13
  • In oredr to more easily assist you, it would be helpful to paste sample data as well in order for us to run your code and your problem to be [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). It doesn't have to be real data and it doesn't have to be long, but something that resembles your input would be beneficial. – MrFlick Jun 01 '14 at 23:22

1 Answers1

1

OK, well I had to adjust the test data a bit. The ranges of your QGRS values were so small in comparison to the scale of the lengths, they they basically didn't draw. They were less than a pixel wide in most cases. So I beefed them up. I also added a GeneID column so I could match them up

#sample data
genes<-data.frame(
    GeneID = c(8682, 10623, 441878, 645382, 650901, 
        841, 28949, 80303, 729992, 100873329), 
    Chromosome = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
    Length = c(10037, 18364, 858, 2986, 917, 54268, 
        38, 76724, 1072, 106)
)

qgrs<-data.frame(
    Start = c(500, 7500, 2200, 25000, 61000), 
    End = c(1100, 8200, 3400, 30000, 64000), 
    Chromosome = c(1L, 1L, 2L, 2L, 2L),
    GeneID=c(8682, 8682, 80303, 80303, 80303)
)

So I wanted to explicitly set a width, and I also capture the output of barplot so I know were each value was plotted.

ww<-20
bb<-barplot( genes[,3], names.arg= genes[,1], 
    col="red", horiz=TRUE,
    las='1', main="Human QGRS by Genes",
    ylab="", xlab="Length of Gene [# of nucleotides]",
    space=.5,
    width=ww
)

Now I look up the position for each gene on the resulting plot, then I use rect to draw rectangles on top of the existing plot at the correct places

idx < -match(qgrs$GeneID, genes$GeneID)
with(qgrs, rect(Start,bb[idx]-ww/2,End, bb[idx]+ww/2, col="green"))

This results in the following plot.

resulting plot

Now if your regions are that tiny, you may need to set border=NA on the rect. By default R will draw a black border around the shape, and if the shape is too small, basically the border just draws over the color and It just looks black. You can also set the border to the color of the rectangles to give them a bit more width.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • This is VERY smart! Thank you so much! I will have to now tweak it so that it can take the data from files instead (because there is just so much!). I shouldn't be so surprised, but honestly I'm so happy you were able to help me!! This was a very simple and clever way of doing this, which I love. – byobob Jun 02 '14 at 13:11