I am working with the log filles consisted of some measurements taken from different samples (identified as float numbers 1.1, 1.2 ... 1.14) that are arranged in the following format:
Finding intramodel H-bonds
Constraints relaxed by 0.5 angstroms and 20 degrees
Models used:
1.1 SarsCov2_structure19R_nsp5holo_rep1.pdb
1.2 SarsCov2_structure19R_nsp5holo_rep1.pdb
1.3 SarsCov2_structure19R_nsp5holo_rep1.pdb
1.4 SarsCov2_structure19R_nsp5holo_rep1.pdb
1.5 SarsCov2_structure19R_nsp5holo_rep1.pdb
1.6 SarsCov2_structure19R_nsp5holo_rep1.pdb
1.7 SarsCov2_structure19R_nsp5holo_rep1.pdb
1.8 SarsCov2_structure19R_nsp5holo_rep1.pdb
1.9 SarsCov2_structure19R_nsp5holo_rep1.pdb
1.10 SarsCov2_structure19R_nsp5holo_rep1.pdb
1.11 SarsCov2_structure19R_nsp5holo_rep1.pdb
1.12 SarsCov2_structure19R_nsp5holo_rep1.pdb
1.13 SarsCov2_structure19R_nsp5holo_rep1.pdb
1.14 SarsCov2_structure19R_nsp5holo_rep1.pdb
16 H-bonds
H-bonds (donor, acceptor, hydrogen, D..A dist, D-H..A dist):
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.1/? HIS 163 NE2 SarsCov2_structure19R_nsp5holo_rep1.pdb #1.1/A UNL 888 S no hydrogen 3.850 N/A
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.1/? GLU 166 N SarsCov2_structure19R_nsp5holo_rep1.pdb #1.1/A UNL 888 O SarsCov2_structure19R_nsp5holo_rep1.pdb #1.1/? GLU 166 H 2.909 2.070
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.1/A UNL 888 N SarsCov2_structure19R_nsp5holo_rep1.pdb #1.1/? CYS 44 O SarsCov2_structure19R_nsp5holo_rep1.pdb #1.1/A UNL 888 H 2.798 1.892
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.2/? GLN 189 NE2 SarsCov2_structure19R_nsp5holo_rep1.pdb #1.2/A UNL 888 S SarsCov2_structure19R_nsp5holo_rep1.pdb #1.2/? GLN 189 1HE2 3.896 2.916
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.3/? GLU 166 N SarsCov2_structure19R_nsp5holo_rep1.pdb #1.3/A UNL 888 O SarsCov2_structure19R_nsp5holo_rep1.pdb #1.3/? GLU 166 H 2.673 1.892
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.3/A UNL 888 N SarsCov2_structure19R_nsp5holo_rep1.pdb #1.3/? CYS 44 O SarsCov2_structure19R_nsp5holo_rep1.pdb #1.3/A UNL 888 H 3.071 2.338
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.4/? HIS 163 NE2 SarsCov2_structure19R_nsp5holo_rep1.pdb #1.4/A UNL 888 S no hydrogen 3.927 N/A
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.4/A UNL 888 N SarsCov2_structure19R_nsp5holo_rep1.pdb #1.4/? THR 190 O SarsCov2_structure19R_nsp5holo_rep1.pdb #1.4/A UNL 888 H 3.029 2.173
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.8/? GLN 189 NE2 SarsCov2_structure19R_nsp5holo_rep1.pdb #1.8/A UNL 888 S SarsCov2_structure19R_nsp5holo_rep1.pdb #1.8/? GLN 189 2HE2 3.631 2.751
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.9/? CYS 145 N SarsCov2_structure19R_nsp5holo_rep1.pdb #1.9/A UNL 888 O SarsCov2_structure19R_nsp5holo_rep1.pdb #1.9/? CYS 145 H 2.966 2.210
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.9/A UNL 888 N SarsCov2_structure19R_nsp5holo_rep1.pdb #1.9/? ARG 188 O SarsCov2_structure19R_nsp5holo_rep1.pdb #1.9/A UNL 888 H 3.067 2.307
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.10/? GLN 189 NE2 SarsCov2_structure19R_nsp5holo_rep1.pdb #1.10/A UNL 888 S SarsCov2_structure19R_nsp5holo_rep1.pdb #1.10/? GLN 189 2HE2 3.693 2.786
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.11/A UNL 888 N SarsCov2_structure19R_nsp5holo_rep1.pdb #1.11/? THR 190 O SarsCov2_structure19R_nsp5holo_rep1.pdb #1.11/A UNL 888 H 3.159 2.268
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.12/? GLU 166 N SarsCov2_structure19R_nsp5holo_rep1.pdb #1.12/A UNL 888 O SarsCov2_structure19R_nsp5holo_rep1.pdb #1.12/? GLU 166 H 2.648 1.817
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.13/A UNL 888 N SarsCov2_structure19R_nsp5holo_rep1.pdb #1.13/? THR 190 O SarsCov2_structure19R_nsp5holo_rep1.pdb #1.13/A UNL 888 H 3.176 2.395
SarsCov2_structure19R_nsp5holo_rep1.pdb #1.14/A UNL 888 N SarsCov2_structure19R_nsp5holo_rep1.pdb #1.14/? PHE 140 O SarsCov2_structure19R_nsp5holo_rep1.pdb #1.14/A UNL 888 H 2.833 1.955
I need to print the number assosiated with the sample (1-14) that should be correspond to the first occurence of two patterns: the "GLU 166 N" as well as "CYS 44 O" and no other patterns within the same sample. I need to print the number present on the same line just before the pattern as #1.number/?, associated with this pattern. So in the example the detected number should be 3 (since the associating number is #1.3/?) where the both patterns (and no others!) could be found. Finally if the both patterns could not be found I would like to print the number corresponded to the sample with the first pattern "GLU 166 N" (like in my example)
Presently my AWK solution is focused on one pattern-based search: looking the first occurence of the "GLU 166 N" ( in the case if the pattern can not be found the script prints 1 ). Basically, it looks for the "pattern" anywhere on the line, and then prints the second part of the number (after the dot) from the 2nd field":
awk -vn=1 '/GLU 166 N/ {gsub(/.*\.|\/\?/,"",$2); n=$2; exit} END {print n}' input.log