0

How can I updated this regex so that it matches even if bob isn't present in the line? (Python).

^(AllowUsers.*) (\bbob\b) ?(.*)$

My naive thought was to just add a "0 or 1" quantifier on capture group 2: (\bbob\b)?, but when I do that, lines that have bob somewhere other than the end no longer match:

### without "?" on capture group 2
# Match
AllowUsers bob
AllowUsers bob billybob
AllowUsers billybob bob
AllowUsers billybob steve bob eric bobby
AllowUsers billybob bobby steve bob eric
AllowUsers richard bob
AllowUsers richard bob jeremy
AllowUsers bob james richard jeremy
AllowUsers bob jeremy

# no match
AllowUsers james richard jeremy

### With "?" on capture group 2:
# All lines match, but `bob` is not captured unless it's at the end of the line:
AllowUsers bob               # `bob` captured
AllowUsers billybob bob      # `bob` captured
AllowUsers bob billybob      # `bob` not captured

My understanding of the regex (with ? on group 2) is:

  • ^(AllowUsers.*) : Match lines that start with AllowUsers and capture that any anything after (group 1), not including the space. This is greedy.
  • (\bbob\b)?: Match and capture bob (group 2), if it exists. We use word boundaries (\b) so that we don't incorrectly match, for example, billybob.
  • ?(.*)$: Match an optional space and capture anything thereafter (group 3).

Here's the regex101 link: https://regex101.com/r/39zNfm/1

If I remove the "0 or 1" quantifier on (\bbob\b), then I match all lines that have bob in them and I get the correct capture groups, but I no longer match lines that don't have bob in them.

What am I misunderstanding?

Desired match and capture behavior

  • The regex should match any line that starts with AllowUsers, whether or not bob is present in the line.
  • If bob is not in the line, then capture the entire line. Do so in two groups: group 1 and group 3. It's OK if group 3 is empty.
  • If bob is in the line, then capture everything before (group 1), including (group 2), and after (group 3)

For example:

test strings and match,group results

Background

I'm writing an Ansible task using the lineinfile builtin. The goal of this task is to add users to the AllowUsers directive of /etc/ssh/sshd_config.

With lineinfile, the regex used must match the line before and after modification so that you maintain idempotence.

In the end, the task would look like:

- name: "Allow {{ user }} to log in via SSH"
  lineinfile:
    path: '/etc/ssh/sshd_config'
    state: present
    regexp: "^(AllowUsers.*) (\b{{ user }}\b)?(\w*)$"  # not currently workng
    line: "\1 {{ user }} \3"   # might have to fiddle with literal space. eg: "\1{{ user}}\3"
    backrefs: yes
  loop: { ssh_users }
  loop_control:
    loop_var: user
dthor
  • 1,749
  • 1
  • 18
  • 45
  • 1
    Change it to `(\bbob\b)?` which is really nothing. Its very hard to tell what this compressed regex `^(AllowUsers.*) (\bbob\b) ?(.*)$` means as far as your intention. – sln Dec 21 '21 at 23:01
  • "Change it to `(\bbob\b)?`" I'm not sure what you mean by this. In the OP I mention that I do exactly that, but then it no longer captures 'bob' if it's found in the middle of the string. – dthor Dec 21 '21 at 23:36
  • "Its very hard to tell what this compressed regex ... means as far as your intention". I thought I was pretty clear in the "Desired behavior" section. Can you elaborate on what's unclear? – dthor Dec 21 '21 at 23:37

1 Answers1

0

If I understand your question correctly, this does the trick:
^(AllowUsers.*)?(\bbob\b)|(.*)
see regex demo and check explanation on the right - the keys are ? and |("alternate")

EDIT:
With your updated test case of not matching anything that does not start with "AllowUsers " and the need to make that match as Group 1, here's a solution:
^(AllowUsers )(?>(.*)?(\bbob\b)(.*)|(.*))$
regex demo

EDIT #2:
After posting the above edit, noted changes to the OP request to captures in three groups. So the pattern is further refined as follows:
^(?|(AllowUsers.+?(?=\bbob\b))(\bbob\b)(.*)|(AllowUsers .*))$
regex demo of Edit #2

SanV
  • 855
  • 8
  • 16
  • Hot damn! That's almost exactly what I need. The only issue is that capture group **1** needs to have the start of the string in it ("AllowUsers ...") so that the backref replacement works correctly. As is your answer puts everything in group **3** (for lines that don't have "bob"). I'll play around and see what I can get. – dthor Dec 22 '21 at 00:21
  • Oh, also your regex will match lines that **don't** start with AllowUsers :-( I should have included that in my original test cases – dthor Dec 22 '21 at 00:27
  • 1
    yes. i'll work on it after a bit but if you fix it before, please post it here. – SanV Dec 22 '21 at 00:33
  • @dthor the last edit is likely what you're looking for. you may need to trim the extra space(s) in the captured groups. hope this helps. cheers! – SanV Dec 22 '21 at 05:21
  • 1
    Note for people seeing this in the future: python's stdlib `re` module does **not** support branch reset pattern. You must use the `regex` module (https://github.com/mrabarnett/mrab-regex). I'm going to mark this as the answer - now I just have to hack Ansible to use `regex` instead... – dthor Dec 27 '21 at 17:33