1

I have a python string that comes in a standard format string and i want to extract a piece of that string.

The string come as such:

logs(env:production service:FourDS3.Expirer @Properties.NewStatus:(ChallengeAbandoned OR Expired) @Properties.Source:Session).index(processing).rollup(count).by(@Properties.AcsInfo.Host).last(15m) > 60

I want to extract everything between logs(), that is i need to get this env:production service:FourDS3.Expirer @Properties.NewStatus:(ChallengeAbandoned OR Expired) @Properties.Source:Session

I have tried the below regex but it's not working:

result = re.search('logs((.+?)).', message.strip())
return result.group(1)

result = re.search('logs((.*?)).', message.strip())
return result.group(1)

Can someone please help me ?

Mervin Hemaraju
  • 1,921
  • 2
  • 22
  • 71

3 Answers3

2

Conclusion first:

import pyparsing as pp

txt = 'logs(env:production service:FourDS3.Expirer @Properties.NewStatus:(ChallengeAbandoned OR Expired) @Properties.Source:Session).index(processing).rollup(count).by(@Properties.AcsInfo.Host).last(15m) > 60'

pattern = pp.Regex(r'.*?logs(?=\()') + pp.original_text_for(pp.nested_expr('(', ')'))
result = pattern.parse_string(txt)[1][1:-1]
print(result)

* You can install pyparsing by pip install pyparsing

If you persist in using regex, my answer would not be appropriate. According to this post, however, it seems difficult to parse such nested parentheses by regex. So, I used pyparsing to deal with your case.

Other examples:

The following examples work fine as well:

txt = 'logs(a(bc)d)e'
result = pattern.parse_string(txt)[1][1:-1]
print(result) # a(bc)d

txt = 'logs(a(b(c)d)e(f)g)h(ij(k)l)m'
result = pattern.parse_string(txt)[1][1:-1]
print(result) # a(b(c)d)e(f)g

Note:

Unfortunately, if a pair of parentheses gets broken inside logs(), an unexpected result is obtained or IndexError is raised. So you have to be careful about what kind of text comes in:

txt = 'logs(a)b)c'
result = pattern.parse_string(txt)[1][1:-1]
print(result) # a

txt = 'logs(a(b)c'
result = pattern.parse_string(txt)[1][1:-1]
print(result) # IndexError
quasi-human
  • 1,898
  • 1
  • 2
  • 13
0

If that input string is always in exactly the same format, then you could use the fact that the closing bracket for logs is followed by a .:

original = '''logs(env:production service:FourDS3.Expirer @Properties.NewStatus:(ChallengeAbandoned OR Expired)@Properties.Source:Session).index(processing).rollup(count).by(@Properties.AcsInfo.Host).last(15m) > 60'''
extracted = original.split('logs(')[1].split(').')[0]
print(extracted)

Which gives you this, without the need for regex:

'env:production service:FourDS3.Expirer @Properties.NewStatus:(ChallengeAbandoned OR Expired)@Properties.Source:Session'
PangolinPaws
  • 670
  • 4
  • 10
  • Seems like a fair way to do it but any chance we could achieve this through regex ? Just asking – Mervin Hemaraju Feb 02 '22 at 10:33
  • this assumes that after `logs()` there's a `.` which although matches the example is not what the question asks: `I want to extract everything between logs()` –  Feb 02 '22 at 10:36
  • That's a fair point, @SembeiNorimaki, hence the caveat at the top of my answer. – PangolinPaws Feb 02 '22 at 12:49
0

You can achieve the result via regex like this:

input = "logs(env:production service:FourDS3.Expirer @Properties.NewStatus:(ChallengeAbandoned OR Expired) @Properties.Source:Session).index(processing).rollup(count).by(@Properties.AcsInfo.Host).last(15m) > 60"
pattern = r'logs\((?P<log>.*)\).index'
print(re.search(pattern, input).group('log'))
# which prints:
# env:production service:FourDS3.Expirer @Properties.NewStatus:(ChallengeAbandoned OR Expired) @Properties.Source:Session

The ?<P> is a named group, which you access by calling group with the name specified inside <>

anotherGatsby
  • 1,568
  • 10
  • 21