0

I need to parse log files which run a command in the following format:

cmd options < stdin > stdout 2>> stderr

Some commands don't use stdin; some don't use stdout.

I can easily separate the cmd; I'm having trouble writing regex which can give me the other portions.

I know how to match up until a single string:

How to match "anything up until this sequence of characters" in a regular expression?

I don't know how to match up until string a OR string b.

That is, I want to match options up until either < or > or 2>> occurs.

Trying something like the following doesn't work.

import re

test = "cmd test1 test2 -c test3 < infile > outfile 2>> err"

optRegex = '.+?(?=>|<|(2>>))'
optRegex = re.compile(optRegex)

stdoutRegex = '>+?(?=>|<|(2>>))'
stdoutRegex = re.compile(stdoutRegex)

# get options
result = optRegex.search(test)
options = result.group()
rest = test[len(options):]
options = options.rstrip()

# get stdout
result = stdoutRegex.search(rest)
stdout = result.group()
rest = rest[len(stdout):]
stdout = stdout.rstrip()


print(options)
print(stdout)
print(rest)

Output:

cmd test1 test2 -c test3
>
 infile > outfile 2>> err

In hindsight, this is probably easier with a loop and scanning for start and end characters, but I'm curious about a regex solution.

Thanks!

andy
  • 41
  • 3
  • `^(.+?)(< .+?)?(> .+?)?(2>> .+)$` should capture the four sections of your input into separate groups. – CAustin May 09 '19 at 23:09
  • Perhaps match until the first occurrence of either `< or > or 2>>` and then match what comes after and split on either of those – The fourth bird May 10 '19 at 08:37

0 Answers0