2

I have this bash statement with perl regex:

echo $1 | perl -pe 's|(?:://).+?(?:/)|b|'

prints this:

httpbTesting/JS/displayName.js

from this:

http://php2-mindaugasb.c9.io/Testing/JS/displayName.js

I was expecting:

http://b/Testing/JS/displayName.js

Maybe I don't understand something about the non-capturing groups? I thought they are supposed to match, but not capture (like a positive lookahead and look behind combined). Am I mistaken?

Mindaugas Bernatavičius
  • 3,757
  • 4
  • 31
  • 58
  • 2
    Re "please also advise on how non-capturing groups work", Example 1) `/ab{2}/` matchs strings containing `abb`, while `/(?:ab){2}/` matches strings containing `abab`. Example 2) `/ab|c/` matches strings containing `ab` and strings containing `c`, while `/a(?:b|c)/` matches strings containing `ab` and strings containing `ac` – ikegami Jan 09 '15 at 18:26

1 Answers1

5

You should use:

perl -pe 's|(//).+?(/)|$1b$2|'

Non capturing group doesn't mean that input text won't be consumed. Non-capturing parentheses group the regex so you can apply regex operators, but do not capture anything.

Or use lookarounds and avoid capturing groups:

echo "$1" | perl -pe 's|(?<=://).+?(?=/)|b|'
http://b/Testing/JS/displayName.js
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • "Non capturing group doesn't mean that input text won't be consumed." please also advise on how they work - would appreciate it – Mindaugas Bernatavičius Jan 09 '15 at 17:50
  • 1
    Added more explanation. You can also check: http://www.regular-expressions.info/brackets.html – anubhava Jan 09 '15 at 17:53
  • 2
    or `s|://\K.+?(?=/)|b|` - `\K` (not available in really old perls) sets where the consumption starts – ysth Jan 09 '15 at 17:55
  • Don't want to add another question for non-capturing group explanation, but I still don't get it. Let me illustrate by example: – Mindaugas Bernatavičius Jan 09 '15 at 18:00
  • Still don't quite get it. How do they differ from a capturing group then, for example: (ab){2} "ab matches the characters ab literally (case sensitive)" and (?:ab){2} "ab matches the characters ab literally (case sensitive)" - both match twice, because of the quantifier (tried on: https://regex101.com). Can you explain the difference? – Mindaugas Bernatavičius Jan 09 '15 at 18:12
  • 2
    Difference is that in `(ab){2}` regex you have captured group back-reference available as `$1` but in `(ab){2}` you cannot use `$1` – anubhava Jan 09 '15 at 18:16
  • You wrote (ab){2} twice in the comment just above. [... ] One additional question: I have read this: http://stackoverflow.com/questions/21200514/regular-expression-matching-vs-capturing ---> but this is precissely why I thought that non-capturing group works like a look-around - I thought it matches the contents in (?: * ) but disregards them when returning. .. So, returns are available as variables and they are returned by captures, however the non-captures are still matches, just not available as variables because they are not returned? – Mindaugas Bernatavičius Jan 10 '15 at 03:38
  • 1
    Sorry I meant `(?:ab){2}` 2nd time in above comment. To answer your question no non-capturing group doesn't discard the matches. It just doesn't keep it available in back-reference `$1`, `$2`, `$3` etc. – anubhava Jan 10 '15 at 03:42