-1

I am working on 4C data where I have a .txt file that contains chromosome, start,end, nReads, RPMs, p.value, q.value and I am only interested in significant interactions in chr15 and later want to filter the interactions that are farther than 100kb and nearer to 3kb.

library(r3Cseq)
library(BSgenome.Hsapiens.UCSC.hg19.masked)
library(GenomicRanges)
library(Homo.sapiens)

kura.int <- read.table("KURA_DpnII.interaction.txt", header = T)
kura_data <- kura.int[kura.int$chromosome == "chr15" & kura.int$q.value > 0.1, ]
kura.int.gr <- makeGRangesFromDataFrame(kura_data, keep.extra.columns = T)

id <- "91433"
rccdGene <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene,
                  filter=list(gene_id=id))

rccdPromoter <- start(rccdGene)
kura_end <- ((rccdPromoter+kura_data$end)/2)

kura <- cbind(rccdPromoter, kura_end)
kura_2 <- cbind(kura, kura_data$chromosome)
colnames(kura_2) <- c("start", "end", "chr")

kura_3 <- kura_2[distance(kura_2$start, kura_2$end)<=100000]

In "kura_2" matrix I have 3 columns namely "chr", "start" and "end" where I have a new start as a promoter of the gene and different endings. So I tried the wrote the above block of code but when I come to the filtering step used function "distance" I am getting this error

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘distance’ for signature ‘"character", "character"’

Now I have a kura_2 matrix which contains 3 columns namely "chr" "start" "end"

     start        end   chr
1 91498106   86026693 chr15
2 91498106   91466684 chr15
3 91498106   88330238 chr15
4 91498106 91488399.5 chr15
5 91498106 91491012.5 chr15
6 91498106   91768848 chr15

Now, how do I filter the genomic interactions that are more than 100kb and less than 3kb between the start and end?

The new start is the promoter of the gene and the new end is ((start+end)/2) that's the reason I have float values because in this way it is easy to plot interactions from my promoter (bait). Is there a better way to filter out the interactions? Thank you in advance

  • Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including example data in a plain text format - for example the output from `dput(yourdata)`. We cannot copy/paste data from images. – neilfws Mar 22 '21 at 22:45
  • It's unclear from your example what is meant by an interaction, or what should be "more than 100kb", or how something can be less than 3kb between both start and end. Can you be more precise and provide better example data with examples of what would be kept and discarded by the filter. – neilfws Mar 22 '21 at 23:05
  • Thank you for your suggestion. I have remodified my question. – koushik ayaluri Mar 22 '21 at 23:25
  • I think the error is because you are calling `distance` on a matrix with columns of type character. `distance` expects x = a GenomicRanges instance and y = a GRanges instance, not a matrix. Also note that when you use `cbind` you are converting variables of type numeric (start, end) to type character. – neilfws Mar 23 '21 at 00:09

1 Answers1

0

Probably like this:


your.data <- cbind.data.frame(
    start = c(91498106, 91498106, 91498106, 91498106, 91498106,
              91498106, 91498106),
    end = c(55757151, 55757918, 55758715.5, 55776189, 55779372.5,
            55781096.5, 55791947),
    chr = c("chr15", "chr15", "chr15", "chr15", "chr15", "chr15", "chr15")
)

library(dplyr)
good.data <- your.data %>% filter( (start-end) %between% c(3e3, 100e3) )

I take it none of these rows should be kept, as they change from start to end by way more than 100kb?

Sirius
  • 5,224
  • 2
  • 14
  • 21