I am trying to find the average over a certain window of a meta column in a GRanges object.
So I have a data object:
> gr.data
GRanges object with 10505026 ranges and 1 metadata column:
seqnames ranges strand | value
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 [10468, 10468] + | 0
[2] chr1 [10469, 10469] - | 1
[3] chr1 [10470, 10470] + | 0.5
[4] chr1 [10471, 10471] - | 1
[5] chr1 [10483, 10483] + | 0.6
and a window object:
gr.windows
GRanges object with 6077 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [ 1, 100000] *
[2] chr1 [100001, 200000] *
[3] chr1 [200001, 300000] *
[4] chr1 [300001, 400000] *
[5] chr1 [400001, 500000] *
I then convert the data object to a RleList:
gr.RleList <- mcolAsRleList(gr.data, varname = "value")
gr.RleList
RleList of length 3
$chr1
numeric-Rle of length 249239887 with 5816335 runs
Lengths: 10467 1 1 1 ... 1 1 13 2
Values : NA 0 1 0.5 ... 0 NaN NA NaN
If I then use binnedAverage, I just get NA values.
gr.binnedAvg <- binnedAverage(gr.windows, gr.RleList, "value")
gr.binnedAvg
GRanges object with 6077 ranges and 1 metadata column:
seqnames ranges strand | value
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 [ 1, 100000] * | <NA>
[2] chr1 [100001, 200000] * | <NA>
[3] chr1 [200001, 300000] * | <NA>
[4] chr1 [300001, 400000] * | <NA>
[5] chr1 [400001, 500000] * | <NA>
unique(mcols(gr.binnedAvg)$value)
[1] NA
I also tried to sum up the values, with the same result. Do the NA values in my list mess this up, or is there something wrong in my aproach?
Minimal example:
library(BSgenome.Hsapiens.UCSC.hg19)
library(data.table)
gr.windows <- tileGenome(seqinfo(Hsapiens), tilewidth=100000,cut.last.tile.in.chrom=TRUE)
gr.data <- GRanges(c("chr1", "chr2"), IRanges(c(10, 50), c(15, 55)), value = c(20, 10))
gr.data.RleList <- mcolAsRleList(gr.data, varname = "value")
seqlevels(gr.windows, force=TRUE) <- names(gr.data.RleList)
gr.data.binnedAvg <- binnedAverage(gr.windows, gr.data.RleList, "value")
gr.data.binnedAvg
GRanges object with 4925 ranges and 1 metadata column:
seqnames ranges strand | value
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 [ 1, 100000] * | <NA>
[2] chr1 [100001, 200000] * | 0
[3] chr1 [200001, 300000] * | 0
[4] chr1 [300001, 400000] * | 0
Thank you very much for your help!