0

I need to find multiple patterns in a file with awk, and count them. I don't want to write these patterns manually. I would like to know if it possible to create a "searching patterns loop" ?

The file which contains the patterns is (I only look for the even lines) :

>bc1001_5p
CACATATCAGAGTGCGTGGATTGATATGTAATACGACTCACTATAG
>bc1001_3p
CACATATCAGAGTGCGTCTCAGGCG
>bc1002_5p
ACACACAGACTGTGAGTGGATTGATATGTAATACGACTCACTATAG
>bc1002_3p
ACACACAGACTGTGAGTCTCAGGCG
>bc1003_5p
ACACATCTCGTGAGAGTGGATTGATATGTAATACGACTCACTATAG
>bc1003_3p
ACACATCTCGTGAGAGTCTCAGGCG
>bc1004_5p
CACGCACACACGCGCGTGGATTGATATGTAATACGACTCACTATAG
>bc1004_3p
CACGCACACACGCGCGTCTCAGGCG

I would like something like that :

awk '/the loop with all the patterns/ {count++} END{print count}' the_file_where_I_look_for_those_patterns

Thanks

Paillou
  • 779
  • 7
  • 16
  • Thank you for sharing your efforts in form of code, could you please post samples of expected output in your question for making it more clear, thank you. – RavinderSingh13 Feb 25 '21 at 10:29
  • Actually, it will only output the number of occurences. A simple number will be enough. – Paillou Feb 25 '21 at 10:31
  • Do you want exact matches or partial matches? – Andre Wildberg Feb 25 '21 at 10:36
  • I want exact matches. All the "ACTG" letters will be necessary. – Paillou Feb 25 '21 at 10:38
  • 1
    Read https://stackoverflow.com/questions/65621325/how-do-i-find-the-text-that-matches-a-pattern and then replace "pattern" with "full-or-partial string-or-regexp" everywhere it occurs in your question so we can best help you. – Ed Morton Feb 25 '21 at 15:43

2 Answers2

0

I want exact matches

You may try this 2 pass awk for this:

awk 'FNR == NR {if (!(FNR%2)) patt[$0]; next} $0 in patt' patt.txt mainfile

This will go through pattern file first and will store every even numbered line into patt associative array. In 2nd pass it will find lines same as what we have stored in patt array from main file.

anubhava
  • 761,203
  • 64
  • 569
  • 643
0

another approach

$ grep -Fxcf <(sed -n '2~2p' patterns_file) data_file
karakfa
  • 66,216
  • 7
  • 41
  • 56