2

I'm trying to write a Prettify-style syntax highlighter for Qiskit Terra (which closely follows the Python syntax). Apparently, Prettify uses Javascript flavor regex. For instance, /^\"(?:[^\"\\]|\\[\s\S])*(?:\"|$)/, null, '"' is the regex corresponding to valid strings in Q#. Basically I'm trying to put together the equivalent regex expression for Python.

Now, I know that Python supports strings within triple quotes i.e. '''<string>''' and """<string>""" are valid strings (this format is especially used for docstrings). To deal with this case I wrote the corresponding capturing group as:

(^\'{3}(?:[^\\]|\\[\s\S])*(?:\'{3}$))

Here is the regex101 link.

This works okay except in some cases like:

''' 'This "is" my' && "first 'regex' sentence." ''' &&
''' 'This "is" the second.' '''

Clearly here it should have considered ''' 'This "is" my' && "first 'regex' sentence." ''' as one string and ''' 'This "is" the second.' ''' as another. But no, the regex I wrote groups together the whole thing as one string (check the regex101 link). That is, it doesn't conclude the string even when it encounters a ''' (corresponding to the ''' at the beginning).

How should I modify the regex (^\'{3}(?:[^\\]|\\[\s\S])*(?:\'{3}$)) to take into account this case? I'm aware of this: How to match “anything up until this sequence of characters” in a regular expression? but it doesn't quite answer my question, at least not directly.

  • Check out a sublime module for syntax highlighting. And go through the regex snippets. https://github.com/MagicStack/MagicPython in grammars might help some. – ABC Mar 29 '19 at 20:59
  • @Raymond Thanks, checking. I just noticed [this answer](https://stackoverflow.com/questions/1472047/regex-for-triple-quote/1472390#1472390) which makes me wonder whether it's possible using Regex at all. :/ –  Mar 29 '19 at 21:14

1 Answers1

0

I Don't know what else you want to use this for but the following regex does what you want with the example given with the MULTILINE flag on.

My_search = re.findall("(?:^\'{3})(.*)(?:\'{3})", My_string, re.MULTILINE)

print(My_search[0])
print(My_search[1])

Output is,

'This "is" my' && "first 'regex' sentence." 
'This "is" the second.' 

You can also see it working here https://regex101.com/r/k4adk2/11

  • 1
    It solves the examples given in the OP but it does not handle the general case of tripple quote strings, as they can contain escaped quotes, e.g. in`r""""\""""` `\"` should be caught. – JohanL Mar 31 '19 at 05:45