-2

Why does one need to add the DOTALL flag for the python regular expression to match characters including the new line character in a raw string. I ask because a raw string is supposed to ignore the escape of special characters such as the new line character. From the docs:

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline.

This is my situation:

string = '\nSubject sentence is:  Appropriate support for families of children diagnosed with hearing impairment\nCausal Verb is :  may have\npredicate sentence is:  a direct impact on the success of early hearing detection and intervention programs in reducing the negative effects of permanent hearing loss'

re.search(r"Subject sentence is:(.*)Causal Verb is :(.*)predicate sentence is:(.*)", string ,re.DOTALL)

results in a match , However , when I remove the DOTALL flag, I get no match.

kolonel
  • 1,412
  • 2
  • 16
  • 33

2 Answers2

2

In regex . means any character except \n

So if you have newlines in your string, then .* will not pass that newline(\n).

But in Python, if you use the re.DOTALL flag(also known as re.S) then it includes the \n(newline) with that dot .

Sabuj Hassan
  • 38,281
  • 14
  • 75
  • 85
1

Your source string is not raw, only your pattern string.

maybe try

string = r'\n...\n'
re.search("Subject sentence is:(.*)Causal Verb is :(.*)predicate sentence is:(.*)", string)
bbold
  • 36
  • 1