1

I am trying to find the average over a certain window of a meta column in a GRanges object.

So I have a data object:

> gr.data
GRanges object with 10505026 ranges and 1 metadata column:
         seqnames                 ranges strand |      value
            <Rle>              <IRanges>  <Rle> | <numeric>
     [1]     chr1         [10468, 10468]      + |         0
     [2]     chr1         [10469, 10469]      - |         1
     [3]     chr1         [10470, 10470]      + |       0.5
     [4]     chr1         [10471, 10471]      - |         1
     [5]     chr1         [10483, 10483]      + |       0.6

and a window object:

gr.windows
GRanges object with 6077 ranges and 0 metadata columns:
     seqnames                 ranges strand
        <Rle>              <IRanges>  <Rle>
 [1]     chr1       [     1, 100000]      *
 [2]     chr1       [100001, 200000]      *
 [3]     chr1       [200001, 300000]      *
 [4]     chr1       [300001, 400000]      *
 [5]     chr1       [400001, 500000]      *

I then convert the data object to a RleList:

gr.RleList <-  mcolAsRleList(gr.data, varname = "value")

gr.RleList
RleList of length 3
$chr1
numeric-Rle of length 249239887 with 5816335 runs
 Lengths:             10467                 1                 1                 1 ...                 1                 1                13                 2
 Values :                NA                 0                 1               0.5 ...                 0               NaN                NA               NaN

If I then use binnedAverage, I just get NA values.

gr.binnedAvg <- binnedAverage(gr.windows, gr.RleList, "value")
gr.binnedAvg
GRanges object with 6077 ranges and 1 metadata column:
     seqnames                 ranges strand |      value
        <Rle>              <IRanges>  <Rle> | <numeric>
 [1]     chr1       [     1, 100000]      * |      <NA>
 [2]     chr1       [100001, 200000]      * |      <NA>
 [3]     chr1       [200001, 300000]      * |      <NA>
 [4]     chr1       [300001, 400000]      * |      <NA>
 [5]     chr1       [400001, 500000]      * |      <NA>

unique(mcols(gr.binnedAvg)$value)
[1] NA

I also tried to sum up the values, with the same result. Do the NA values in my list mess this up, or is there something wrong in my aproach?

Minimal example:

library(BSgenome.Hsapiens.UCSC.hg19)
library(data.table)

gr.windows <- tileGenome(seqinfo(Hsapiens), tilewidth=100000,cut.last.tile.in.chrom=TRUE)

gr.data <- GRanges(c("chr1", "chr2"), IRanges(c(10, 50), c(15, 55)), value = c(20, 10))
gr.data.RleList <-  mcolAsRleList(gr.data, varname = "value")
seqlevels(gr.windows, force=TRUE) <- names(gr.data.RleList)
gr.data.binnedAvg <- binnedAverage(gr.windows, gr.data.RleList, "value")

gr.data.binnedAvg
GRanges object with 4925 ranges and 1 metadata column:
     seqnames                 ranges strand |     value
        <Rle>              <IRanges>  <Rle> | <numeric>
 [1]     chr1       [     1, 100000]      * |      <NA>
 [2]     chr1       [100001, 200000]      * |         0
 [3]     chr1       [200001, 300000]      * |         0
 [4]     chr1       [300001, 400000]      * |         0

Thank you very much for your help!

zx8754
  • 52,746
  • 12
  • 114
  • 209
T. Keller
  • 11
  • 4
  • Can you please post a minimal reproducible example. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – emilliman5 Dec 01 '16 at 21:05
  • Sorry, I forgot. So I attached it now. I guess it is because of missing values in the window? But I cannot figure out how to avoid that. – T. Keller Dec 02 '16 at 08:16
  • `mcolAsRleList` and `binnedAverage` are not part of the libraries you loaded, nor are they in the `GenomicRanges` package – emilliman5 Dec 02 '16 at 13:39

1 Answers1

3

Using the "minimal example", the following works:

library(BSgenome.Hsapiens.UCSC.hg19)
library(GenomicRanges)

gr.windows <- tileGenome(seqinfo(Hsapiens), tilewidth=100000, cut.last.tile.in.chrom=TRUE)
gr.data <- GRanges(c("chr1", "chr2"), IRanges(c(10, 50), c(15, 55)), value=c(20, 10))

gr.data.cov <- GenomicRanges::coverage(gr.data, weight="value")

seqlevels(gr.windows, pruning.mode="coarse") <- names(gr.data.cov)

gr.data.binnedAvg <- binnedAverage(gr.windows, gr.data.cov, "value")

Late response, but may be useful for future reference.

Luis
  • 481
  • 5
  • 7