-1

I have this test string, which I want to extract data from using Regex:

"-> Single-row index lookup on using <auto_distinct_key> (actor_id=actor.actor_id) (cost=901.66 rows=5478) (actual time=0.001..0.001 rows=0 loops=200)"

The six fields of data that I want to extract are written in bold. The segment that is written in italic is the optional part of the statement.

This is the pattern which I have arrived at thus far:

-> (.+) (\(cost=([\d\.]+) rows=(\d+)\))? \(actual time=([\d\.]+) rows=(\d+) loops=(\d+)\)

This gives me six groups with all the data I want. However, when I omit the optional part of the string it does not match at all. I suspected this was due to superfluous whitespaces, so I thought it might work to move the whitespace into the optional group, like this:

-> (.+)( \(cost=([\d\.]+) rows=(\d+)\))? \(actual time=([\d\.]+) rows=(\d+) loops=(\d+)\)

Which did not work.

It seems to match the optional group as part of the first group, which is not really what I want. I want them separate, and I'm not quite sure how to do that.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Commodent
  • 229
  • 1
  • 2
  • 7
  • @RokoC.Buljan I know it might look a bit confusing, but the first group is really just info about a specific query operation and not all that relevant. It could very well have been "Nested loop inner join" instead of "Single-row index lookup on using (actor_id=actor.actor_id)" as in this case. – Commodent Sep 16 '20 at 21:54
  • So, you want to retrieve: `Single-row index lookup on using (actor_id=actor.actor_id)`. Than optionally `901.66` and `5478` . And `0.001..0.001`, `0`, `200` ? – Roko C. Buljan Sep 16 '20 at 21:58
  • @RokoC.Buljan Yes, that is correct. – Commodent Sep 16 '20 at 21:59

1 Answers1

0

You have to make the first (.+) lazy quantifier (.+?)
https://regex101.com/r/fbE0tW/1

 # (.+?)[ ]?((?<=[ ])\(cost=([\d\.]+)[ ]rows=(\d+)\))?[ ]\(actual[ ]time=([\d\.]+)[ ]rows=(\d+)[ ]loops=(\d+)\)
 
 ( .+? )                       # (1)
 [ ]? 
 (                             # (2 start)
    (?<= [ ] )
    \( cost=
    ( [\d\.]+ )                   # (3)
    [ ] rows=
    ( \d+ )                       # (4)
    \) 
 )?                            # (2 end)
 [ ] 
 \( 
 actual [ ] time=
 ( [\d\.]+ )                   # (5)
 [ ] 
 rows=
 ( \d+ )                       # (6)
 [ ] 
 loops=
 ( \d+ )                       # (7)
 \)
  • "You have to make the first (.+) lazy quantifier (.+?)" Thank you, that solved the problem. Much appreciated. – Commodent Sep 16 '20 at 22:14