1

I have a long line like this below in Python:

patterns = re.compile(r'(^/afs|^/pkg|^/bin|^/dev|^/etc|^/usr|^/usr2|^/sys|^/sbin|^/proc)')

And then I tried and changed it to this format below. However, it does not work as expected anymore:

patterns = re.compile(r'(^/afs|\
                         ^/pkg|\
                         ^/bin|\
                         ^/dev|\
                         ^/etc|\
                         ^/usr|\
                         ^/usr2|\
                         ^/sys|\
                         ^/sbin|\
                         ^/proc)')

I know Python is using indentation for syntax so probably this format would not work maybe? or is there a right way to format this in Python if there is?

martineau
  • 119,623
  • 25
  • 170
  • 301
Jason T.
  • 113
  • 1
  • 2
  • 7

3 Answers3

2

If you want to keep everything in one single set of quotations you can use a docstring, that will remove the need for line breaks or multiple r'^...' symbols, i.e.

patterns = re.compile(r"""
                         ...
                        """)

As described in the following post: How do I do line continuation with a long regex?

I generally find this approach more readable.

martineau
  • 119,623
  • 25
  • 170
  • 301
TomM
  • 175
  • 9
  • Thank you! This sounds a good way for the format, but I guess I would still need "^", right? such as r""" (^/afs| ^/pkg| ^/sbin) """ Otherwise, how would it know these regex patterns need to be at the beginning of the line? – Jason T. Sep 13 '21 at 23:39
  • 1
    @yuchit: Yes you still need the `^` characters which are part of the regex pattern itself. – martineau Sep 13 '21 at 23:52
  • @martineau Thank you! Yes, I also tried it out and can confirm ```^``` is still needed. – Jason T. Sep 13 '21 at 23:55
1

It's very straightforward:

patterns = re.compile(r'(^/afs|'
                      r'^/pkg|'
                      r'^/bin|'
                      r'^/dev|'
                      r'^/etc|'
                      r'^/usr|'
                      r'^/usr2|'
                      r'^/sys|'
                      r'^/sbin|'
                      r'^/proc)')

I recommend using a decent Python editor or IDE, as these will typically do this for you automatically. I happen to use PyCharm which does, but I'm sure VSCode, or other popular IDEs and code editors can do a good job at this as well.

Grismar
  • 27,561
  • 4
  • 31
  • 54
1

Python regular expressions have a verbose mode that lets you use a multi-line triple quote with comments. Also note that re.match (as opposed to re.search) implies a ^ and the / doesn't need to be repeated.

You can add a leading (?x) to the pattern or use the flags re.VERBOSE or its alias re.X to turn on verbose mode:

import re

patterns = re.compile(r'''(?x)/(usr2|     # comments also supported
                               sbin|
                               proc|
                               afs|
                               pkg|       # better to put longer matches first
                               bin|
                               dev|
                               etc|
                               usr|       # if before /usr2, would match this first
                               sys)''')

for trial in ('/afs/blah','/usr2/blah'):
    print(patterns.match(trial))

Output:

<re.Match object; span=(0, 4), match='/afs'>
<re.Match object; span=(0, 5), match='/usr2'>

Also note that whitespace and comments are ignored in verbose mode, so if you have significant white space match it explicitly with \s or other escape codes.

Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251