Python print both the matching groups in regex

Question

I want to find two fixed patterns from a log file. Here is a line in a log file looks like

passed dangerb.xavier64.423181.k000.drmanhattan_resources.log Aug 23 04:19:37 84526 362

From this log, I want to extract drmanhattan and 362 which is a number just before the line ends.

Here is what I have tried so far.

import sys
import re

with open("Xavier.txt") as f:
    for line in f:
        match1 = re.search(r'((\w+_\w+)|(\d+$))',line)
        if match1:
            print match1.groups()

However, everytime I run this script, I always get drmanhattan as output and not drmanhattan 362.

Is it because of | sign?

How do I tell regex to catch this group and that group ?

I have already consulted this and this links however, it did not solve my problem.

score 1 · Answer 1 · answered Aug 03 '16 at 09:31

1

| mean OR so your regex catch (\w+_\w+) OR (\d+$)

Maybe you want something like this :

((\w+_\w+).*?(\d+$))

answered Aug 03 '16 at 09:31

baddger964

1,199
9
18

Julien Spronck · Accepted Answer · 2016-08-03T09:45:59.493

1

line = 'Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37 84526 362'

match1 = re.search(r'(\w+_\w+).*?(\d+$)', line)
if match1:
    print match1.groups()
    # ('drmanhattan_resources', '362')

If you have a test.txt file that contains the following lines:

Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37 84526 362 Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37 84526 363 Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37 84526 361

you can do:

with open('test.txt', 'r') as fil:
    for line in fil:
        match1 = re.search(r'(\w+_\w+).*?(\d+)\s*$', line)
        if match1:
            print match1.groups()
# ('drmanhattan_resources', '362')
# ('drmanhattan_resources', '363')
# ('drmanhattan_resources', '361')

edited Aug 03 '16 at 09:45

answered Aug 03 '16 at 09:32

Julien Spronck

15,069
4
47
55

I am reading multiple lines from a file (each has the same search pattern) and only last one fetches me the output using this solution. :( – Recker Aug 03 '16 at 09:37
@Recker This should also work for every line of the file, maybe you have spaces or other addtional characters. See last edit. – Julien Spronck Aug 03 '16 at 09:46
Got it. I used `strip()` on the line object to get rid of extra spaces and it worked like a charm. – Recker Aug 03 '16 at 09:50

score 1 · Answer 3 · answered Aug 03 '16 at 09:37

With re.search you only get the first match, if any, and with | you tell re to look for either this or that pattern. As suggested in other answers, you could replace the | with .* to match "anything in between" those two pattern. Alternatively, you could use re.findall to get all matches:

>>> line = "passed dangerb.xavier64.423181.k000.drmanhattan_resources.log Aug 23 04:19:37 84526 362"
>>> re.findall(r'\w+_\w+|\d+$', line)
['drmanhattan_resources', '362']

Python print both the matching groups in regex

3 Answers3