Python multiline regex groups with finditer only returns last match

Question

I have a repeating text output, and I want to capture five groups from each repetition. The pattern stretches across several newlines. I want to get an iterator of tuples. I tried this, but it only seems to capture the last match trying findall returns a list with the last tuple as well:

import re
string = '''-----------------------------------------------------------------------
Selecting top 2 features.
Top features (not sorted): CXVol,CCVol
Total prediction score (mean accuracy): 0.611111
              precision    recall  f1-score   support

           1       0.62      0.83      0.71         6
           2       1.00      0.50      0.67         6
           3       0.43      0.50      0.46         6

    accuracy                           0.61        18
   macro avg       0.68      0.61      0.61        18
weighted avg       0.68      0.61      0.61        18

Ranking of other features (sorted): IL10,IL5,R2GP,R2Thal,FACC,IL6,FASTR,R2CC,p75,TNF,GPVol,R2STR,ODISTR,STRVol,R2CC,FACX,ILB,ODIGP,FAHIPP,MDThal,FAThal,IL2,MDCC,MDSTR,MDGP
-----------------------------------------------------------------------

-----------------------------------------------------------------------
Selecting top 3 features.
Top features (not sorted): CXVol,CCVol,IL10
Total prediction score (mean accuracy): 0.666667
              precision    recall  f1-score   support

           1       0.60      1.00      0.75         6
           2       0.75      0.50      0.60         6
           3       0.75      0.50      0.60         6

    accuracy                           0.67        18
   macro avg       0.70      0.67      0.65        18
weighted avg       0.70      0.67      0.65        18

Ranking of other features (sorted): IL5,R2GP,R2Thal,FACC,IL6,FASTR,R2CC,p75,TNF,GPVol,R2STR,ODISTR,STRVol,R2CC,FACX,ILB,ODIGP,FAHIPP,MDThal,FAThal,IL2,MDCC,MDSTR,MDGP
-----------------------------------------------------------------------

-----------------------------------------------------------------------
Selecting top 4 features.
Top features (not sorted): CXVol,CCVol,IL5,IL10
Total prediction score (mean accuracy): 0.611111
              precision    recall  f1-score   support

           1       0.60      1.00      0.75         6
           2       0.75      0.50      0.60         6
           3       0.50      0.33      0.40         6

    accuracy                           0.61        18
   macro avg       0.62      0.61      0.58        18
weighted avg       0.62      0.61      0.58        18

Ranking of other features (sorted): R2GP,R2Thal,FACC,IL6,FASTR,R2CC,p75,TNF,GPVol,R2STR,ODISTR,STRVol,R2CC,FACX,ILB,ODIGP,FAHIPP,MDThal,FAThal,IL2,MDCC,MDSTR,MDGP
-----------------------------------------------------------------------
'''

p = re.compile(".*top\s(\d+)\sf"
               ".*Top.*ed\):\s(\S+)\n"
               ".*curacy\):\s(\S+)\n"
               ".*hted\savg\s+(\S+)\s+(\S+)", re.S)
m = p.finditer(string)
[print(x.groups()) for x in m]

#Out ('4', 'CXVol,CCVol,IL5,IL10', '0.611111', '0.62', '0.61')

Try splitting it on the horizontal bars before using the regex! — ti7, Jan 13 '21 at 19:08
See https://regex101.com/r/ZN2hXp/1, use `.*?` instead of `.*` — Wiktor Stribiżew, Jan 13 '21 at 19:18

Python multiline regex groups with finditer only returns last match

0 Answers0