0

I want to find two fixed patterns from a log file. Here is a line in a log file looks like

passed dangerb.xavier64.423181.k000.drmanhattan_resources.log Aug 23 04:19:37 84526 362

From this log, I want to extract drmanhattan and 362 which is a number just before the line ends.

Here is what I have tried so far.

import sys
import re

with open("Xavier.txt") as f:
    for line in f:
        match1 = re.search(r'((\w+_\w+)|(\d+$))',line)
        if match1:
            print match1.groups()

However, everytime I run this script, I always get drmanhattan as output and not drmanhattan 362.

Is it because of | sign?

How do I tell regex to catch this group and that group ?

I have already consulted this and this links however, it did not solve my problem.

Community
  • 1
  • 1
Recker
  • 1,915
  • 25
  • 55

3 Answers3

1

| mean OR so your regex catch (\w+_\w+) OR (\d+$)

Maybe you want something like this :

((\w+_\w+).*?(\d+$))
baddger964
  • 1,199
  • 9
  • 18
1
line = 'Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37 84526 362'

match1 = re.search(r'(\w+_\w+).*?(\d+$)', line)
if match1:
    print match1.groups()
    # ('drmanhattan_resources', '362')

If you have a test.txt file that contains the following lines:

Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37 84526 362 Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37 84526 363 Passed dangerb.xavier64.423181.r000.drmanhattan_resources.log Aug 23 04:19:37 84526 361

you can do:

with open('test.txt', 'r') as fil:
    for line in fil:
        match1 = re.search(r'(\w+_\w+).*?(\d+)\s*$', line)
        if match1:
            print match1.groups()
# ('drmanhattan_resources', '362')
# ('drmanhattan_resources', '363')
# ('drmanhattan_resources', '361')
Julien Spronck
  • 15,069
  • 4
  • 47
  • 55
  • I am reading multiple lines from a file (each has the same search pattern) and only last one fetches me the output using this solution. :( – Recker Aug 03 '16 at 09:37
  • @Recker This should also work for every line of the file, maybe you have spaces or other addtional characters. See last edit. – Julien Spronck Aug 03 '16 at 09:46
  • Got it. I used `strip()` on the line object to get rid of extra spaces and it worked like a charm. – Recker Aug 03 '16 at 09:50
1

With re.search you only get the first match, if any, and with | you tell re to look for either this or that pattern. As suggested in other answers, you could replace the | with .* to match "anything in between" those two pattern. Alternatively, you could use re.findall to get all matches:

>>> line = "passed dangerb.xavier64.423181.k000.drmanhattan_resources.log Aug 23 04:19:37 84526 362"
>>> re.findall(r'\w+_\w+|\d+$', line)
['drmanhattan_resources', '362']
tobias_k
  • 81,265
  • 12
  • 120
  • 179