Filter overlapping entries in bed file

Question

I have a bed file that looks like this:

1   183113  183114  chr1:183113-183240  0   +
1   187286  187287  chr1:187128-187287  0   -
1   187576  187587  chr1:187375-187577  0   -
1   187580  187590  chr1:187379-187577  0   -

My aim is to extract only those rows for which entries do not overlap with any others. For some time I have been trying bedtools merge according to the doc. I wanted to use specific flags to count the entries that constituted to each "merged" fragment and later keep only those with value "1" but here comes the problem: I don't know how to keep the information about the strand, score (this should always be 0) and name(this might be reconstructed from first 3 columns). Does anyone know how to put these things together?

Output should look exactly as input (above) bed but only with these rows that do not overlap with anything else.

1   183113  183114  chr1:183113-183240  0   +
1   187286  187287  chr1:187128-187287  0   -

Clarification: please include a sample of the desired input and output that are not identical. They can have the same format, but must not have entirely the same data. — agc, Apr 16 '17 at 16:31
I think you need `reduce` from [GenomicRanges package](http://bioconductor.org/packages/release/bioc/vignettes/GenomicRanges/inst/doc/GenomicRangesIntroduction.pdf). — zx8754, Apr 17 '17 at 20:00
Could this work (using the right options): `bedtools merge`, then `bedtools complement` on the result, then `bedtools intersect` of the original with the complement of the merge ? — bli, Apr 18 '17 at 09:13

score 3 · Answer 1 · answered Apr 18 '17 at 16:42

OK, I worked this out:

1) Count the overlaps in the original input

bedtools merge -i IN.bed -c 1 -o count > counted

2) Filter out only those rows that do not overlap with anything

awk '/\t1$/{print}' counted > filtered

3) Intersect it with the original input and keep only those original rows that were found after filtering as well

bedtools intersect -a IN.bed -b filtered -wa > OUT.bed

Filter overlapping entries in bed file

1 Answers1

Linked