Using Python with Matthew Barnett's regex module.
I have this string:
The well known *H*rry P*tter*.
I'm using this regex to process the asterisks to obtain <em>H*rry P*tter</em>
:
REG = re.compile(r"""
(?<!\p{L}|\p{N}|\\)
\*
([^\*]*?) # I need this part to deal with nested patterns; I really can't omit it
\*
(?!\p{L}|\p{N})
""", re.VERBOSE)
PROBLEM
The problem is that this regex doesn't match this kind of strings unless I protect intraword asterisks first (I convert them to decimal entities), which is awfully expensive in documents with lots of asterisks.
QUESTION
Is it possible to tell the negative class to block at internal asterisks only if they are not surrounded by word characters?
I tried these patterns in vain:
([^(?:[^\p{L}|\p{N}]\*[^\p{L}|\p{N}])]*?)
([^(?<!\p{L}\p{N})\*(?!\p{L}\p{N})]*?)