-1

Below is my regex which captures the second log pattern completely as required and in first pattern it captures upto the PID and content after that has to be captured with the same named group as "Message".

How do I use the captured naming group which is defined at one place to use it at another place in regex?

(?P<timestamp>\w+\s\d+\s\d+:\d+:\d+)\s(?P<host>\w+)\s(?P<process>\w+\/\w+|\w+)\[(?P<process_id>\d+)\]\W+|(?P<message_id>\d+[a-zA-Z0-9]+).+:(?P<message>.*)

Text:

Jul 14 06:03:92 jhyhr0392 postfix/postdrop[9303]: **warning: unable to lookup public/pickup: Nosuch file or directory**
Jul 14 06:03:92 jhyhr0392 sendmail[9303]: 09AX982X4GT36: to=root, ctladdr=root (0/0), delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=53002, relay=[127.0.0.1] [127.0.0.1],dsn=4.0.0, state=Deferred: Connection refuse by [127.0.0.1]
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563

1 Answers1

0

In your pattern, you can omit the last or | as the part that you want to match is part of the same string.

If group message is always present, and message_id is optional:

(?P<timestamp>\w+\s\d+\s\d+:\d+:\d+)\s(?P<host>\w+)\s(?P<process>\w+\/\w+|\w+)\[(?P<process_id>\d+)\]\W+(?P<message_id>\d+[a-zA-Z0-9]+)?.+:(?P<message>.*)
  • (?P<timestamp>\w+\s\d+\s\d+:\d+:\d+)\s Match timestamp format and whitespace char
  • (?P<host>\w+)\s Match word chars and whitespace char
  • (?P<process>\w+\/\w+|\w+) Match wordchars, optionally / and wordchars
  • \[(?P<process_id>\d+)\]\W+ Match 1+ digits between [] followed by non word chars
  • (?P<message_id>\d+[a-zA-Z0-9]+)? Make this group optional
  • .+: Match until the end of the string, and backtrack until the last occurrence of :
  • (?P<message>.*) Match the rest of the line

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70