1

The goal is to use grep and find all URL paths containing an underscore. Underscores in the query string are ignored. Examples:

Positives:
/abc_bcd/def/
/foo_bar_foo
/image_bar?s=color
/foo_bar
/foo_remover?s=foo_bar

Negatives:
/foo
/foo/bar/foo
/foo/bar/foo?s=foo_bar
/foo-supersizer
/foosupersizer
/foobar
/foo-supersizer?s=foo_bar
foo_bar
foo bar bar_foo
foo_bar_foo

This regular expression works, but applying it inside of grep (on macOS) fails to yield any files even though there are ones containing matching paths.

Regular expression: /^(?=[^?\s]*_)(?:\/[-a-zA-Z0-9()@:%_?\+.~#&=]+)+\/?$/gm

RegEx test: https://regex101.com/r/tIYoP7/3

Grep command: grep -r "^(?=[^?\s]*_)(?:\/[-a-zA-Z0-9()@:%_?\+.~#&=]+)+\/?$" .

Does grep require special formatting for regular expressions on macOS?

Crashalot
  • 33,605
  • 61
  • 269
  • 439

2 Answers2

3

Given the sample input/output you posted all you need is:

$ grep '^/[^?]*_' file
/abc_bcd/def/
/foo_bar_foo
/image_bar?s=color
/foo_bar
/foo_remover?s=foo_bar

If that isn't all you need then edit your question to provide more truly representative sample input and expected output that includes cases where the above doesn't work.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

Without look-arounds, which require Perl-compatible regular expressions (and can't be used with macOS grep), you could do it in two steps:

grep '_' infile | grep -v '^[^_]*?.*_'

The first part gets you all URLs with an _; the second part then removes the false positives where the _ exists only as part of the query string.

Or, in a single step, with extended regular expressions:

grep -E '^[^?]*_[^?]*(\?.*)?'

This anchors an _ surrounded by sequences of characters other than question marks to the beginning of the string, followed by an optional query string. This makes sure that the matching underscore isn't part of the query string.

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116