0

The following regular expression is not returning any match:

import re

regex = '.*match.*fail.*'
pattern = re.compile(regex)
text = '\ntestmatch\ntestfail'

match = pattern.search(text)

I managed to solve the problem by changing text to repr(text) or setting text as a raw string with r'\ntestmatch\ntestfail', but I'm not sure if these are the best approaches. What is the best way to solve this problem?

  • Adding on to this, `.` traditionally matches all characters excluding the newline. Thus, even if you escape your string, you still need to consider this. – Mous Aug 05 '21 at 04:08
  • Not a good duplicate but this is a common FAQ; hang on while I look for a better one. – tripleee Aug 05 '21 at 05:38

2 Answers2

3

Using repr or raw string on a target string is a bad idea!
By doing that newline characters are treated as literal '\n'.
This is likely to cause unexpected behavior on other test cases.

The real problem is that . matches any character EXCEPT newline.
If you want to match everything, replace . with [\s\S].
This means "whitespace or not whitespace" = "anything".
Using other character groups like [\w\W] also works,
and it is more efficient for adding exception just for newline.

One more thing, it is a good practice to use raw string in pattern string(not match target).
This will eliminate the need to escape every characters that has special meaning in normal python strings.

WieeRd
  • 792
  • 1
  • 7
  • 17
0

You could add it as an or, but make sure you \ in the regex string, so regex actually gets the \n and not a actual newline.

Something like this:

regex = '.*match(.|\\n)*fail.*'

This would match anything from the last \n to match, then any mix or number of \n until testfail. You can change this how you want, but the idea is the same. Put what you want into a grouping, and then use | as an or.

enter image description here

On the left is what this regex pattern matched from your example.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Josh
  • 318
  • 2
  • 8