A component in Python's docutils
module uses the regular expression below in the machinery that is designed to translate text flanked with asterisks into italicised text:
Raw: Most people know what is meant by the latin phrase *Carpe Diem*.
Translated: Most people know what is meant by the latin phrase Carpe Diem.
It's a pretty straight-forward pattern: match an asterisk if it is not preceded by a space, a newline or the null character. What I'd like to know is what's gained by appending the empty unicode string (u''
) to the pattern? It's appended to a number of other patterns that are also found within docutils
, but i've no idea what difference it makes to whether a given bit of text matches or not.
non_whitespace_escape_before = r'(?<![ \n\x00])'
end_string_suffix = u''
emphasis=re.compile(non_whitespace_escape_before + r'(\*)' + end_string_suffix, re.U)
# emphasis.pattern -> u'(?<![ \\n\\x00])(\\*)'