2

I want to print both the gclid and the session named captures, but my regex is quitting as soon as it matches the gclid:

echo '"https://example.com/foo/?gclid=abc1234gef76786" session="765dsfsdf7657657khkjh"' | perl -nE '/(?<gclid>gclid=[^&"#\s]*)|(?<session>session=.*)/&&say"gclid: $+{gclid} session: $+{session}"'

Results in:

gclid: gclid=abc1234gef76786 session:

But I want:

gclid: gclid=abc1234gef76786 session: session="765dsfsdf7657657khkjh"

The echo is just an example line from one of millions that will be processed.

jaygooby
  • 2,436
  • 24
  • 42
  • 1
    You are performing an "or" so it found `gclid` and quit; just like you told it to... – MonkeyZeus Sep 27 '19 at 12:34
  • Thanks, I get that. Just not sure what the combining operator should be. – jaygooby Sep 27 '19 at 13:17
  • Then you can use something like [`\?gclid=(?.*?)".*?session="(?.*?)"`](https://regex101.com/r/7dG3P3/1). This of course works for the exact example you provided so if other lines have deviations from your example's format then it would be in your best interest to provide multiple examples and explain what should and what should not match. – MonkeyZeus Sep 27 '19 at 13:21

1 Answers1

2

That's because you are using the | operator so the regex matching stops when any of the two patterns matches. You can use .* in between the two patterns instead. Put \b before session to ensure word boundary:

perl -nE '/(?<gclid>gclid=[^&"#\s]*).*(?<session>\bsession=.*)/&&say"gclid: $+{gclid} session: $+{session}"'
blhsing
  • 91,368
  • 6
  • 71
  • 106