2

I am working on this regex problem I'm unable to solve. The regex I've made

import re
message = """[key    X] value
[key    X]  value value
[key    X]  value
value
value
value
[key     ] value
[key     ] ?
[key     ] ?"""

messageRegex = re.compile(r"\[(.*?)][\s](.*)")

for value in messageRegex.findall(message):
    print(value)

The output to this is, as given below and not everything is getting captured.

('key    X', 'value') ('key\tX', 'value value') ('key\tX', 'value')
('key\t ', 'value') ('key\t ', '?') ('key\t ', '?')

enter image description here

I would expect the output to look like

('key    X', 'value') ('key\tX', 'value value') ('key\tX', 'value \nvalue \nvalue \nvalue')
('key\t ', 'value') ('key\t ', '?') ('key\t ', '?')
ArunK
  • 1,731
  • 16
  • 35
  • Anchor at the start and make the first two patterns optional - `^(?:\[(.*?)]\s+)?(.*)`, see https://regex101.com/r/h3wwUa/1 – Wiktor Stribiżew May 10 '19 at 11:48
  • @WiktorStribiżew Thanks for the response. I may not have explained it correctly - From the link you've provided I'm looking for the 'value' in line 4,5,6 to be a part of match 3 – ArunK May 10 '19 at 13:06
  • You should have explained it in the question, please edit it. Is `message` a single string variable? – Wiktor Stribiżew May 10 '19 at 13:07
  • Yes, the message is a single string. – ArunK May 10 '19 at 13:08
  • 2
    Try `re.findall(r'^\[([^][]*)]\s+(.*(?:\n(?!\[[^][]*]).*)*)', message, re.M)` – Wiktor Stribiżew May 10 '19 at 13:09
  • One last one, how about removing the X at the end? – ArunK May 10 '19 at 13:13
  • Try `re.compile(r"^\[([^][]*?)X?]\s+(.*(?:\n(?!\[[^][]*]).*)*)", re.M)`, see [Python demo](https://ideone.com/y034Oz) and the [regex demo](https://regex101.com/r/fySzoJ/2). Just move it out of the first group and add `?` after it to make it optional, make Group 1 pattern lazy. – Wiktor Stribiżew May 10 '19 at 13:15

1 Answers1

3

You may use

(?m)^\[([^][]*)]\s+(.*(?:\n(?!\[[^][]*]).*)*)

See the regex demo

Details

  • ^ - start of a line
  • \[ - [
  • ([^][]*) - Group 1: any 0+ chars other than [ and ]
  • ] - a ] char
  • \s+ - 1+ whitespaces
  • (.*(?:\n(?!\[[^][]*]).*)*) - Group 2:
    • .* - the rest of the line
    • (?:\n(?!\[[^][]*]).*)* - zero or more repetitions of:
      • \n(?!\[[^][]*]) - a newline not followed with a [...] substring
      • .* - the rest of the line

Python demo:

import re
message = """[key    X] value
[key    X]  value value
[key    X]  value
value
value
value
[key     ] value
[key     ] ?
[key     ] ?"""

messageRegex = re.compile(r"^\[([^][]*)]\s+(.*(?:\n(?!\[[^][]*]).*)*)", re.M)

for value in messageRegex.findall(message):
    print(value)

Output:

('key    X', 'value')
('key    X', 'value value')
('key    X', 'value\nvalue\nvalue\nvalue')
('key     ', 'value')
('key     ', '?')
('key     ', '?')
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563