2

I'm trying to make a syntax highlighter for Python using regular expressions (in Python). Among other things, I want to highlight keywords such as for, while, if etc. To do this I need a regex which matches them.

My issue is that I don't want, for instance, for to be matched when it is inside a string, only when isolated (whitespace before and after).

I had \bfor\b at first, which matches every occurrence of a separated for. The issue with this is that it includes things like "string with for inside"

I have thought about look-behind/ahead (as this question suggests), but can't get around that this requires fixed width patterns in Python. Would love to get some guiding tips on things to try here.

In short: What could be a regex matching keywords such as for only when interpreted by Python as such.

Community
  • 1
  • 1
Bendik
  • 1,097
  • 1
  • 8
  • 27

1 Answers1

0

As others have mentioned, there are probably better suited tools for the job. That being said, it's always fun to put regex's to new uses, and combined with a little bit of code it should be possible, just not with a single regex.

Now, there's not an easy way to exclude strings (regex's in general don't handle pairing delimiters nicely), so it would be simplest to create a copy of the text with any strings replaced with spaces so indexing is the same. Something like \"[^"]*\" to find all strings (well, double quoted strings), then replace each match with a string of the same length. Then run your regex to find keywords on the modified string.

Adding in cases for single quotes and comments would be (\"[^"]*\"|'[^']*'|#.*$). Of course, this will break if the strings contain any escaped quotes, so you can look for fixes to that, eg this question.

Community
  • 1
  • 1
user2699
  • 2,927
  • 14
  • 31
  • I'm not sure how replacing the quotes would help in matching for the keywords. Could you explain, please? – Bendik Oct 21 '16 at 12:37
  • You have to break processing into two steps, so in the first step replace any strings or comments with spaces, since you don't want to match keywords in these. In the second step, find keywords in this version of your string that has the comments removed. – user2699 Oct 21 '16 at 12:52