0

I am really new to coding generally. I'm trying to figure out how to write a bash script that can help me find the occurrences of patterns in a fasta file. The occurrences are in a .txt file, and are pretty basic:

ATTG
CGCCT
TGGAC
GGCCA
etct.

I've been trying to write a bash script that looks like:

while read pattern;
do
  grep -c -i $pattern [myfile.fasta] > acountoftheoccurances.txt
done < patternfile.txt

I've thought that would return the list of all the patterns, then a count, but all I got was a count of the number of occurrences for the last pattern (I tried them all manually to see what I should expect).

Any help or pointing in the right direction would be appreciated!

tripleee
  • 175,061
  • 34
  • 275
  • 318
tazzy-mt
  • 1
  • 1
  • Each iteration of the loop overwrites the previous result file. You want to append with `>>`, or better, redirect the whole loop like `... done < patternfile.txt > acountoftheoccurances.txt` – tripleee Jan 24 '23 at 20:33
  • 1
    ... But fundamentally, processing the same file again and again for each pattern is very inefficient. You want to learn basic Awk so you can do it all in one pass. – tripleee Jan 24 '23 at 20:34
  • 1
    @Fravadona Thanks, I added a second duplicate which covers that (implicitly). – tripleee Jan 25 '23 at 05:15
  • If you do that you may also count the pattern in the FASTA header if it exists. You want to make sure that, you exclude the lines (header) that starts with ">" character. – Supertech Jan 26 '23 at 14:13

0 Answers0