This is I assume a somewhat simple programming issue, but I've been struggling with it. Mostly because I don't know the right words to use, perhaps?
Given a set of "ranges" (in the form of 1-a set of numbers as below, 2-IRanges, or 3-GenomicRanges), I'd like to split it into a set of smaller ranges.
Example Beginning:
Chr Start End
1 1 10000
2 1 5000
Example size of breaks: 2000
New dataset:
Chr Start End
1 1 2000
1 2001 4000
1 4001 6000
1 6001 8000
1 8001 10000
2 1 2000
2 2001 4000
2 4001 5000
I'm doing this in R. I know I could generate these simply with seq
, but I'd like to be able to do it based on a list/df of regions instead of having to manually do it every time I have a new list of regions.
Here's an example I've made using seq:
Given 22 chromosomes, loop through them and break each into pieces
# initialize df
Regions <- data.frame(Chromosome = c(), Start = c(), End = c())
# for each row, do the following
for(i in 1:nrow(Chromosomes)){
# create a sequence from the minimum start to the max end by some value
breks <- seq(min(Chromosomes$Start[Chromosomes$Chromosome == i]), max(Chromosomes$End[Chromosomes$Chromosome == i]), by=2000000)
# put this into a dataframe
database <- data.frame(Chromosome = i, Start = breks, End = c(breks[2:length(breks)]-1, max(Chromosomes$End[Chromosomes$Chromosome == i])))
# bind with what we already have
Regions <- rbind(Regions, database)
rm(database)
}
This works fine, I'm wondering if there is something built into a package already to do this as a one-liner OR that is more flexible, as this has its limitations.