0

I am trying to do a (bedtools) intersect. Anyone see what is wrong with my bash code here?

#!/bin/bash

dir=$(pwd)
query=$dir/query_beds
pgs=$dir/pgs_beds

for file in $query/*; do 
   bedtools intersect -wa -wb -a $file -b $pgs/* -c -C -sorted -filenames
done > ${file%.*}-results.txt

"query" directory contains many files which need to each be iteratively queried against many files in "pgs" directory using the package bedtools and command intersect, which allows an asterisk for file "-b" to cycle multiple files.

Per usage:

One or more BAM/BED/GFF/VCF file(s) “B”. Use “stdin” if passing B with a UNIX pipe.
NEW!!!: -b may be followed with multiple databases and/or wildcard (*) character(s).

I believe the problem is with $pgs/*, looping one file against multiple files in another directory and I may need to restructure my loop to change directory or something.

UPDATE:

I have updated the script to work and was able to get everything working with:

#!/bin/bash -x
dir=$(pwd)
query=$dir/query_beds
pgs=$dir/pgs_beds
for f in $(ls $query/*.bed); do 
   for g in $(ls $pgs/*); do
      bedtools intersect -wa -wb -a $f -b $g* -C -filenames
   done > $(basename $f .ext).txt
done
clemaster
  • 1
  • 1

1 Answers1

0

In your code the output of the whole loop is redirected to one file. Maybe you want something like below (redirect the output of each bedtool to a separate file).

Also note that if you use ${file%.*} the outputs will be created in the $query directory so executing the script twice might result in unexpected results. I would recommend to use $(basename $file .ext).txt (where .ext is the extension of the files in $query. The basename command keeps only the name of the file (so without the directory) and also removes the .ext if present.

#!/bin/bash

dir=$(pwd)
query=$dir/query_beds
pgs=$dir/pgs_beds

for file in $query/*; do 
   bedtools intersect -wa -wb -a $file -b $pgs/* -c -C -sorted -filenames > ${file%.*}-results.txt
done 
vladmihaisima
  • 2,119
  • 16
  • 20
  • While that is good information, the resulting error is not with the output file. I'm getting: "command not found 7: /*. Exiting.e to open file /Users/clemaster/.." – clemaster Nov 22 '22 at 23:13
  • I would recommend using "#!/bin/bash -x" on the first line of the file. This should show you all commands executed. Also, what version of bash are you using? – vladmihaisima Nov 22 '22 at 23:32
  • I am using version 3.2.57. I've added both your suggested alternative output method and -x to the first line, neither seems to produce a different result. – clemaster Nov 22 '22 at 23:45
  • Bash 3.2 is from 2006 see https://en.wikipedia.org/wiki/Bash_(Unix_shell). That's very old! As I have no access to such an old version (mine is 5.1), I can't help. I would recommend you to somehow update bash or use a system with a newer version. – vladmihaisima Nov 22 '22 at 23:59
  • I am now using version 5.2. It still yields the same error. A Biostars user thinks the for loop needs to contain a second for loop, one for each directory, and should use $(ls $query/*.bed) instead. I'll be sure to update here if there is any resolution. – clemaster Nov 23 '22 at 03:23
  • Another suggestions: if you try to execute "bedtools" in the same folder does it work? Command not found could suggest that bedtools is not found. If I run the script: `for F in local_dir/*; do echo $F/*; done` for me works as expected... – vladmihaisima Nov 23 '22 at 08:42
  • Bedtools was in the path. I did find that \r was showing up in the path when I started fixing the issue, which is a result from writing the script on my PC and running it on my mac. Heads up that the dos2unix package will clear up those issues. It turns out I needed to nest a loop within a loop. My resulting script that works has been added to the original post. Thanks for your time and help! – clemaster Nov 23 '22 at 19:46