3

I've been trying parse a value in a block.

Let me explain with an example.

I have the following text :

started xx xxxxxxx xxxxx xxxxxx xx xxxxxxxxx xxxxxxx xxxx xx
xx xxx xxxxx xxxx xxxxxxxx xxxx xxxxxx found 9999 xxxxx xxxxx
xxx xx xxxx xxxx xxxxxxxxxxx xxxxxxx xxx stored 9999 finished

I'm trying to catch the value between "started" and "finished"

I tried something like this

(?<block>started(.|\n)*finished)

but I don't know how to add the value \d+ near "stored"?

Mike
  • 627
  • 1
  • 7
  • 21
  • Does this answer your question? [How to match "anything up until this sequence of characters" in a regular expression?](https://stackoverflow.com/questions/7124778/how-to-match-anything-up-until-this-sequence-of-characters-in-a-regular-expres) – n00dl3 Nov 21 '19 at 09:29
  • 2
    The regex does not work with Python `re`, `(?` must be `(?P`. Do not use `(.|\n)*`, use `.*?` with `re.DOTALL`. If you need to captured the digits try `re.findall(r'started(.*?(?:stored\s+(\d+)\s+)?)finished', text, re.S)` – Wiktor Stribiżew Nov 21 '19 at 09:30
  • `re.match("started .+?found (\d+) .+? stored (\d+) finished", flags=re.DOTALL)` – n00dl3 Nov 21 '19 at 09:35

1 Answers1

2

The regex you provided does not work with Python re, as (?<block>...) is not a supported named group syntax, it must look like (?P<block>...).

Also, it is recommended to avoid (.|\n)* that is a very inefficient construct, use .*? with re.DOTALL/re.S or (?s) instead.

If you need to captured the digits alongside the digits after stored and before finished (and if this is optional) use

re.findall(r'started(.*?(?:stored\s+(\d+)\s+)?)finished', text, re.S)

See the regex demo

Details

  • started - left-hand delimiter
  • (.*?(?:stored\s+(\d+)\s+)?) - Gropup 1:
    • .*? - any 0+ chars, as few as possible
    • (?:stored\s+(\d+)\s+)? - an optional group matching
      • stored\s+ - stored and 1+ whitespaces
      • (\d+) - Group 2: one or more digits
      • \s+ - 1+ whitespaces
  • finished - right-hand delimiter.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • @IgorShilov As usual, `with open(file + ".out.txt", 'w') as fw: fw.write(updated_contents)` – Wiktor Stribiżew Jan 10 '20 at 10:50
  • @IgorShilov I have no idea what you mean. Please add the non-working code to the question and explain the expected behavior. – Wiktor Stribiżew Jan 10 '20 at 10:59
  • in my case regex need for current log file more 15 mb if i use this code it didin't work with ⁣ ⁣open ( 'log.log) as reading: ⁣ ⁣ ⁣ ⁣line=reading.read() ⁣ ⁣ ⁣(r'\d+.*07.*?started\s+\w+(.*?(?:stored\s+(\d+)\s+)?)finished.\w+', line, re.DOTALL) ⁣ ⁣ ⁣ ⁣if regex: ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣for res1, res2 in regex: ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣ ⁣print res2 – Igor Shilov Jan 10 '20 at 11:14
  • it works only ide if you try make it for work with file you need construct "with open (file, r): as read" then you should read file by line - "readlines()" or full - "read()" , which method i shoud use and how write corret code in this case. see: https://ideone.com/zK2k0g – Igor Shilov Jan 10 '20 at 11:57
  • @IgorShilov You seem to be using Python2. Apart from this, everything looks good, `reading.read()` is the right method since it reads all file contents into a single variable. – Wiktor Stribiżew Jan 10 '20 at 12:09
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/205749/discussion-between-igor-shilov-and-wiktor-stribizew). – Igor Shilov Jan 10 '20 at 12:11