I need to parse log files which run a command in the following format:
cmd options < stdin > stdout 2>> stderr
Some commands don't use stdin; some don't use stdout.
I can easily separate the cmd; I'm having trouble writing regex which can give me the other portions.
I know how to match up until a single string:
How to match "anything up until this sequence of characters" in a regular expression?
I don't know how to match up until string a OR string b.
That is, I want to match options up until either < or > or 2>> occurs.
Trying something like the following doesn't work.
import re
test = "cmd test1 test2 -c test3 < infile > outfile 2>> err"
optRegex = '.+?(?=>|<|(2>>))'
optRegex = re.compile(optRegex)
stdoutRegex = '>+?(?=>|<|(2>>))'
stdoutRegex = re.compile(stdoutRegex)
# get options
result = optRegex.search(test)
options = result.group()
rest = test[len(options):]
options = options.rstrip()
# get stdout
result = stdoutRegex.search(rest)
stdout = result.group()
rest = rest[len(stdout):]
stdout = stdout.rstrip()
print(options)
print(stdout)
print(rest)
Output:
cmd test1 test2 -c test3
>
infile > outfile 2>> err
In hindsight, this is probably easier with a loop and scanning for start and end characters, but I'm curious about a regex solution.
Thanks!