3

I have a bed file that looks like this:

1   183113  183114  chr1:183113-183240  0   +
1   187286  187287  chr1:187128-187287  0   -
1   187576  187587  chr1:187375-187577  0   -
1   187580  187590  chr1:187379-187577  0   -

My aim is to extract only those rows for which entries do not overlap with any others. For some time I have been trying bedtools merge according to the doc. I wanted to use specific flags to count the entries that constituted to each "merged" fragment and later keep only those with value "1" but here comes the problem: I don't know how to keep the information about the strand, score (this should always be 0) and name(this might be reconstructed from first 3 columns). Does anyone know how to put these things together?

Output should look exactly as input (above) bed but only with these rows that do not overlap with anything else.

1   183113  183114  chr1:183113-183240  0   +
1   187286  187287  chr1:187128-187287  0   -
Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
maciek
  • 1,807
  • 2
  • 18
  • 30
  • Please include a sample of the desired output. – agc Apr 16 '17 at 02:23
  • @agc : edited post - output is in the same format – maciek Apr 16 '17 at 10:37
  • Clarification: please include a sample of the desired input and output that are not identical. They can have the same format, but must not have entirely the same data. – agc Apr 16 '17 at 16:31
  • I think you need `reduce` from [GenomicRanges package](http://bioconductor.org/packages/release/bioc/vignettes/GenomicRanges/inst/doc/GenomicRangesIntroduction.pdf). – zx8754 Apr 17 '17 at 20:00
  • Could this work (using the right options): `bedtools merge`, then `bedtools complement` on the result, then `bedtools intersect` of the original with the complement of the merge ? – bli Apr 18 '17 at 09:13
  • @agc - I have changed the Input and added desired output. – maciek Apr 18 '17 at 15:57

1 Answers1

3

OK, I worked this out:

1) Count the overlaps in the original input

bedtools merge -i IN.bed -c 1 -o count > counted

2) Filter out only those rows that do not overlap with anything

awk '/\t1$/{print}' counted > filtered

3) Intersect it with the original input and keep only those original rows that were found after filtering as well

bedtools intersect -a IN.bed -b filtered -wa > OUT.bed
maciek
  • 1,807
  • 2
  • 18
  • 30