5

I have used git grep for years to search for fixed strings and haven't used it much for doing regular expression searches.

I have places in the code with non-localized strings. For example:

   JLabel label =  buildLabel("Alphabet");

In this case buildLabel() is an inherited utility method. There are also buildBoldLabel(), buildMultiLineLabel(), and buildTextArea().

So I would like to search my code for uses of these methods without a lookup for the localized string. The correct call should be:

   JLabel label =  buildLabel(getString("Alphabet"));

I am very familiar with regular expressions and I see that git grep supports Perl character classes. So I figured that it would be very easy:

$ git grep -P "buildLabel(\"\w+\")"

This returns no results. So I tried it without the Perl extension.

$ git grep "buildLabel(\"[a-zA-Z_]+\")"

Still ... no results. I verified that I could search with a fixed string.

$ git grep "buildLabel(\"Alphabet\")"

That returned the instance in the code that I already knew existed. However ...

$ git grep -P "buildLabel(\"Alphabet\")"

Returns no results.

I also tried changing the quote characters and got the same results.

$ git grep -P 'buildLabel("\w+")' ... no results

$ git grep -P 'buildLabel("Alphabet")' ... no results

$ git grep 'buildLabel("Alphabet")' ... 1 expected result

I tried on Linux with the same results.

UPDATE:

Thanks to @wiktor-stribiżew commenting that with PCRE the parens need to be escaped (I am always confused by that).

$ git grep -P 'buildLabel\("\w+"\)' ... returns 1 expected result.

However, why don't these work?

$ git grep 'buildLabel("[a-zA-Z_]+")'

$ git grep 'buildLabel\("[a-zA-Z_]+"\)'

$ git grep 'buildLabel\("[a-zA-Z_][a-zA-Z_]*"\)' (in case + isn't implemented)


So what am I doing wrong with git grep? Or is it broken?

FYI: I am using git version 2.35.1 from Homebrew on macOS Big Sur.

chrish
  • 2,352
  • 1
  • 17
  • 32
  • 3
    In PCRE regex, `(` and `)` must be escaped to match literal parentheses. It must be something like `git grep -P 'buildLabel\("\w+"\)'` – Wiktor Stribiżew Feb 16 '22 at 15:14
  • 3
    Re: "is it broken?" Chances are that the tool that has been used by millions every day for years is not what's broken. – Andy Lester Feb 16 '22 at 15:16
  • @AndyLester: Yeah, I can't believe it would be broken. But I couldn't figure out how to get it to work. It is POSSIBLE that something is broken. – chrish Feb 16 '22 at 15:21

1 Answers1

3

Regex vs. fixed string search

Please refer to the git grep help:

-G
--basic-regexp
Use POSIX extended/basic regexp for patterns. Default is to use basic regexp.

So, by default, git grep treats the pattern string as a POSIX BRE regex, not as a fixed string.

To make git grep treat the pattern as a fixed string you need -F:

-F
--fixed-strings
Use fixed strings for patterns (don’t interpret pattern as a regex).

Regex issues

You can enable PCRE regex syntax with -P option, and in that case you should refer to PCRE documentation.

In your git grep -P "buildLabel(\"\w+\")", the parentheses must be escaped in order to be matched as literal parentheses, i.e. it should be git grep -P "buildLabel\(\"\w+\"\)".

In git grep 'buildLabel("[a-zA-Z_]+")', you are using the POSIX BRE regex, and + is parsed as a literal + char, not as a one or more quantifier. You can use git grep 'buildLabel("[a-zA-Z_]\{1,\}")' in POSIX BRE though. If it is a GNU grep, you could use git grep 'buildLabel("[a-zA-Z_]\+")' (not sure it works with git).

The git grep 'buildLabel\("[a-zA-Z_]+"\)' does not work because \(...\) (escaped pair of parentheses) define a capturing group and do not thus match literal parentheses.

The git grep -e 'buildLabel\("[a-zA-Z_][a-zA-Z_]*"\)' is the same POSIX BRE, to make it a POSIX ERE, you need to use the -E option, git grep -E 'buildLabel\("[a-zA-Z_][a-zA-Z_]*"\)'. Or git grep -E 'buildLabel\("[a-zA-Z_]+"\)', the unescaped + is a quantifier in POSIX ERE.

Also, see What special characters must be escaped in regular expressions?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • I have been developing for almost 30 years and I thought I understood RegEx. I guess my knowledge is pre-POSIX. Thanks for the explanation! – chrish Feb 16 '22 at 15:58
  • The `-e` in the previous example was not intended to be `-E`. It was just an accidental inclusion from trying things out with the `-e` and without it (the documentation has `-e` before `` and seems to be optional). – chrish Feb 16 '22 at 15:59
  • 1
    @chrish `-e` just means the next thing is the pattern. If the pattern does not start with `-`, it is optional. – Wiktor Stribiżew Feb 16 '22 at 16:03
  • I think you removed a link (https://www.regular-expressions.info/posix.html) which was more helpful for explaining POSX BRE and ERE in regards to + and ? matching. Specifically this quote: "Some implementations support \? and \+ as an alternative syntax to \{0,1\} and \{1,\}, but \? and \+ are not part of the POSIX standard." Maybe I found the link through another post, but I thought it was in here. – chrish Feb 16 '22 at 16:22
  • @chrish I did not remove it as I did not add it. Surely, [that regex site](https://www.regular-expressions.info/posix.html) is really helpful, too. – Wiktor Stribiżew Feb 16 '22 at 16:24
  • @chrish: yes, REs get absurdly complex when you consider all the different flavors ("regex buddy ... 269 flavors ..."!). Git can do Perl REs but only if compiled with Perl support. – torek Feb 17 '22 at 02:47