I'm a chemist dealing with Potential Energy Distributions and the output is kind of messy (some lines use more columns than others) and we have several analysis in one file so I'd like to start and stop parsing when I see some specific "keywords" or signs like "***".
Here is an example of my input:
Average max. Potential Energy <EPm> = 41.291
TED Above 100 Factor TAF=0.011
Average coordinate population 1.000
s 1 1.00 STRE 4 7 NH 1.015024 f3554 100
s 2 1.00 STRE 2 1 CH 1.096447 f3127 13 f3126 13 f3073 37 f3073 34
s 3 1.00 STRE 2 5 CH 1.094347 f3127 38 f3126 36 f3073 12 f3073 11
s 4 1.00 STRE 6 8 CH 1.094349 f3127 36 f3126 38 f3073 11 f3073 13
s 5 1.00 STRE 2 3 CH 1.106689 f2950 48 f2944 46
s 6 1.00 STRE 6 9 CH 1.106696 f2950 47 f2944 47
s 7 1.00 STRE 6 10 CH 1.096447 f3127 12 f3126 13 f3073 33 f3073 38
s 8 1.00 STRE 4 2 NC 1.450644 f1199 43 f965 39
s 9 1.00 STRE 4 6 NC 1.450631 f1199 43 f965 39
s 10 1.00 BEND 7 4 6 HNC 109.30 f1525 12 f1480 42 f781 18
s 11 1.00 BEND 1 2 3 HCH 107.21 f1528 33 f1525 21 f1447 12
s 12 1.00 BEND 5 2 1 HCH 107.42 f1493 17 f1478 36 f1447 20
s 13 1.00 BEND 8 6 10 HCH 107.42 f1493 17 f1478 36 f1447 20
s 14 1.00 BEND 3 2 5 HCH 108.14 f1525 10 f1506 30 f1480 14 f1447 13
s 15 1.00 BEND 9 6 8 HCH 108.13 f1525 10 f1506 30 f1480 14 f1447 13
s 16 1.00 BEND 10 6 9 HCH 107.20 f1528 33 f1525 21 f1447 12
s 17 1.00 BEND 6 4 2 CNC 112.81 f383 85
s 18 1.00 TORS 7 4 2 1 HNCH -172.65 f1480 10 f781 55
s 19 1.00 TORS 1 2 4 6 HCNC 65.52 f1192 27 f1107 14 f243 18
s 20 1.00 TORS 5 2 4 6 HCNC -176.80 f1107 17 f269 35 f243 11
s 21 1.00 TORS 8 6 4 2 HCNC -183.20 f1107 17 f269 35 f243 11
s 22 1.00 TORS 3 2 4 6 HCNC -54.88 f1273 26 f1037 22 f243 19
s 23 1.00 TORS 9 6 4 2 HCNC 54.88 f1273 26 f1037 22 f243 19
s 24 1.00 TORS 10 6 4 2 HCNC -65.52 f1192 30 f1107 18 f243 21
****
9 STRE modes:
1 2 3 4 5 6 7 8 9
8 BEND modes:
10 11 12 13 14 15 16 17
7 TORS modes:
18 19 20 21 22 23 24
19 CH modes:
2 3 4 5 6 7 11 12 13 14 15 16 18 19 20 21 22 23 24
0 USER modes:
alternative coordinates 25
k 10 1.00 BEND 7 4 2 HNC 109.30
k 11 1.00 BEND 1 2 4 HCN 109.41
k 12 1.00 BEND 5 2 4 HCN 109.82
k 13 1.00 BEND 8 6 4 HCN 109.82
k 14 1.00 BEND 3 2 1 HCH 107.21
k 15 1.00 BEND 9 6 4 HCN 114.58
k 16 1.00 BEND 10 6 8 HCH 107.42
k 18 1.00 TORS 7 4 2 5 HNCH -54.98
k 18 1.00 TORS 7 4 2 3 HNCH 66.94
k 18 1.00 OUT 4 2 6 7 NCCH 23.30
k 19 1.00 OUT 2 3 5 1 CHHH 21.35
k 19 1.00 OUT 2 1 5 3 CHHH 21.14
k 19 1.00 OUT 2 3 1 5 CHHH 21.39
k 20 1.00 OUT 2 1 4 5 CHNH 21.93
k 20 1.00 OUT 2 5 4 1 CHNH 21.88
k 20 1.00 OUT 2 1 5 4 CHHN 16.36
k 21 1.00 TORS 8 6 4 7 HCNH 54.98
k 21 1.00 OUT 6 10 9 8 CHHH 21.39
k 22 1.00 OUT 2 1 4 3 CHNH 20.12
k 22 1.00 OUT 2 5 4 3 CHNH 19.59
k 23 1.00 TORS 9 6 4 7 HCNH -66.94
k 23 1.00 OUT 6 8 4 9 CHNH 19.59
k 24 1.00 TORS 10 6 4 7 HCNH -187.34
k 24 1.00 OUT 6 9 4 10 CHNH 20.32
k 24 1.00 OUT 6 8 4 10 CHNH 21.88
I'd like to skip the first 3 lines (I know how to do that with skiprows=3
) then I'd like to stop parsing at the "***" and accommodate my content into 11 columns with predefined names like "tVib1" "%PED1" "tVib2" "%PED2" etc.
After that, I'll have, in this same file to start parsing after the word "alternative coordinates" into 11 columns.
Looks very hard to achieve for me.
Any help is much appreciated.