I have this tabulated file as shown:
1 MGNVFEKLFKSLFGKKEMRILMVGLDAAGKTTILYKLKLGEIVTTIPTIGFNVETVEYKNISFTVWDVGGQDKIRPLWRHYFQNTQGLIFVVDSNDRERVNEAREELTRMLAEDELRDAVLLVFVNKQDLPNAMNAAEITDKLGLHSLRQRNWYIQATCATSGDGLYEGLDWLSNQLKNQK V
2 MGNVFEKLFKSLFGKKEMRILMVGLDAAGKTTILYKLKLGEIVTTIPTIGFNVETVEYKNISFTVWDVGGQDKIRPLWRHYFQNTQGLIFVVDSNDRERVNEAREELTRMLAEDELRDAVLLVFVNKQDLPNAMNAAEITDKLGLHSLRQRNWYIQATCATSGDGLYEGLDWLSNQLKNQK M
.
.
And so on...
The first column is the number, second column corresponds to protein sequence and third column is the last character and the pattern to find in the corresponding sequence for each case.
Thus, the desired output will be something like that:
1:positions:4 23 43 53 56 65 68 91 92 100 120 123 125
2:positions:1 18 22 110 134
I have tried with awk and index function.
nawk -F'\t' -v p=$3 'index($2,p) {printf "%s:positions:", NR; s=$2; m=0; while((n=index(s, p))>0) {m+=n; printf "%s ", m; s=substr(s, n+1)} print ""}' "file.tsv"
However it works only specifying the variable -v as a character or string but not $3. How can I get it in unix environment? Thanks in advance