19

When matching an expression on multiple lines, I always used re.DOTALL and it worked OK. Now I stumbled across the re.MULTILINE string, and it looks like it's doing the same thing.

From the re module (doesn't make it clearer, but the values are different):

M = MULTILINE = sre_compile.SRE_FLAG_MULTILINE # make anchors look for newline
S = DOTALL = sre_compile.SRE_FLAG_DOTALL # make dot match newline

SRE_FLAG_MULTILINE = 8 # treat target as multiline string
SRE_FLAG_DOTALL = 16 # treat target as a single string

So is there a difference in the usage, and what is the subtle cases where it could return something different?

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • 3
    I'm glad there is this python specific question. The supposed "duplicate" was not helpful for me. – Trevor Boyd Smith Aug 30 '17 at 17:45
  • thanks! all things considered I'm leaving it closed. Once Martjin has answered there's no need to add anything :) being closed as duplicate isn't necessarily bad, and in that case, it allowed you to find the question & get your answer, so it is useful. – Jean-François Fabre Aug 30 '17 at 19:13
  • but I undeleted it, because I feel that the title keywords are useful for future users (check above comment :)) – Jean-François Fabre Mar 27 '19 at 07:05

2 Answers2

41

They are quite different. Yes, both affect how newlines are treated, but they switch behaviour for different concepts.

  • re.MULTILINE affects where ^ and $ anchors match.

    Without the switch, ^ and $ match only at the start and end, respectively, of the whole text. With the switch, they also match just before or after a newline:

    >>> import re
    >>> re.search('foo$', 'foo\nbar') is None  # no match
    True
    >>> re.search('foo$', 'foo\nbar', flags=re.MULTILINE)
    <_sre.SRE_Match object; span=(0, 3), match='foo'>
    
  • re.DOTALL affects what the . pattern can match.

    Without the switch, . matches any character except a newline. With the switch, newlines are matched as well:

    >>> re.search('foo.', 'foo\nbar') is None  # no match
    True
    >>> re.search('foo.', 'foo\nbar', flags=re.DOTALL)
    <_sre.SRE_Match object; span=(0, 4), match='foo\n'>
    
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • FYI for those reading this: negated character classes like `r'[^x]'` seem to match newlines **whether or not** dotall is set! e.g. while `re.match(r'.*', 'abc\n123')` only nabs the first line without dotall, `re.match(r'[^4]*', 'abc\n123')` nabs the whole string – JamesTheAwesomeDude May 02 '23 at 19:04
  • @JamesTheAwesomeDude: that's entirely to be expected, because `[^x]` is not the `.` pattern. `DOTALL` only changes how `.` works, not negated character classes. You can always add `\n` to the negated class: `[^x\n]`. – Martijn Pieters Jun 13 '23 at 20:11
6

It is not doing the same thing, DOTALL matches the newline character as well, while MULTILINE enables ^ and $ to work on every line.

Example:

The quick brown fox 
jumps over the lazy dog.

Here, .+ would yield two results (the first and the second line) with no DOTALL mode. If DOTALL is turned on, it matches the whole phrase.

The quick brown fox 
jumps over the lazy dog.

Here, ie ^\w+ with MULTILINE mode on, will match twice as there's a word character at the start of every line.

Jan
  • 42,290
  • 8
  • 54
  • 79