0

Is there any expression that can replace the flag re.DOTALL in the following line of python code:

r = re.compile(r'<p>.*<p>.', re.IGNORECASE | re.MULTILINE | re.DOTALL)

like add any expression to this regular expression: r'<p>.*<p>.' to be something like :

r = re.compile(r'<p>something.*<p>.', re.IGNORECASE | re.MULTILINE )

but need to keep the same behavior, which is matching multiline string (paragraph)

Thanks

Elhabib
  • 141
  • 2
  • 10
  • Note you seem to be trying to extract content from HTML using regular expressions, in which case [have you considered an HTML parser?](https://stackoverflow.com/a/1732454/3001761) – jonrsharpe Feb 12 '21 at 08:56

2 Answers2

0

I think you have it backwards, and the following two should be equivalent:

r = re.compile(r'<p>[\s\S]*<p>.', re.IGNORECASE | re.MULTILINE)

r = re.compile(r'<p>.*<p>.', re.IGNORECASE | re.MULTILINE | re.DOTALL)

That is, matching .* in DOTALL mode should behave the same as matching [\s\S]* with DOTALL mode disabled.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • I am providing the regular expression from a json input file to the script like this: "regex":"

    [\s\S]*

    " but still showing an error message "Error decoding json in file "input_file.json" "

    – Elhabib Feb 12 '21 at 09:07
  • JSON? Don't use regex on JSON, use a parser instead. – Tim Biegeleisen Feb 12 '21 at 09:08
0

You can use the inline flag (?s).

re.compile(regex, re.DOTALL)

is equivalent to

re.compile('(?s)' + regex)
timgeb
  • 76,762
  • 20
  • 123
  • 145