I have these two dataset: Before that contains 5 columns (chromsome, start, end, line number, score)
chrI 861 870 87 5
chrI 871 880 88 11
chrI 881 890 89 11
chrI 891 900 90 19
chrI 901 910 91 19
chrI 911 920 92 20
chrI 921 930 93 20
chrI 931 940 94 20
chrI 941 950 95 19
chrI 951 960 96 19
chrI 961 970 97 19
chrI 971 980 98 19
chrI 981 990 99 25
chrI 991 1000 100 20
chrI 1001 1010 101 20
chrI 1011 1020 102 20
chrI 1021 1030 103 20
chrI 1031 1040 104 15
chrI 1041 1050 105 14
chrI 1051 1060 106 14
chrI 1061 1070 107 13
chrI 1071 1080 108 13
chrI 1081 1090 109 13
chrI 1091 1100 110 7
chrI 1101 1110 111 7
Peaks that contains 4 columns (chromsome, start, end, value)
"chrI" 880 1091 383
"chrI" 1350 1601 302
"chrI" 1680 1921 241
"chrI" 2220 2561 322
"chrI" 2750 2761 18
"chrI" 3100 3481 420
"chrI" 3660 4211 793
"chrI" 4480 4491 20
"chrI" 4710 4871 195
"chrI" 5010 5261 238
For each lines of Peaks I would like to extract the corresponding lines (e.g all the lines between 880 and 1091 for the first line) in Before, find the highest score value and write it on a new file. Output
chrI 981 990 99 25
To this end, I've written this function:
summit <- function(x,y,output){
y<- Before
chrom <- x[1]
start <-x[2]
end <-x[3]
startLine <- y[which((y$V1 == chrom) & (y$V2==start)),]
endLine <- y[which((y$V1 == chrom) & (y$V3==end)),]
Subset <- y[which((y$V2 >= startLine$V2) & (y$V3 <= endLine$V2))]
maximum <- Subset[which(Subset$V4 == max(Subset$V4))]
output <- print(maximum)
}
apply(Peaks,1,summit,output = 'peaks_list.bed')
I don't have an error message but It runs during the entire night without giving me results so I guess something is wrong with my code but I don't know what.
I also try this:
Peaks_Range <- GRanges(seqnames=Peaks$V1, ranges=IRanges(start=Peaks$V2, end=Peaks$V3))
Before_Range <- GRanges(seqnames=Before$V1, ranges=IRanges(start=Before$V2, end=Before$V3),score=Before$V5)
Merged <- mergeByOverlaps(Peaks_Range,Before_Range)
Merged <- as.data.frame(Merged)
for (i in 1:nrow(Peaks)){
start <-Peaks[i,2]
end <-Peaks[i,3]
Subset <- subset(Merged,Merged$Peaks_Range.start == start)
maximum <- as.numeric(max(Subset$score))
summit <- Subset[which(Subset$score == maximum),]
write.table(summit,'peaks_list.bed', sep="\t", append=T, col.name = FALSE, row.names = FALSE, quote = FALSE)
}
It works (I think) but this is very very slow so I search an alternative way to do it.
Does anyone have any idea?