3

Here is the code at the bash shell. How is the file mask supposed to be specified, if not this way? I expected both commands to find the search expression, but it's not happening. In this example, I know in advance that I prefer to restrict the search to python source code files only, because unqualified searches are silly time wasters.

So, this works as expected:

grep -rni '/home/ga/projects' -e 'def Pr(x,u,v)'

/home/ga/projects/anom/anom.py:27:def Pr(x,u,v): blah, blah, ...

but this won't work:

grep --include=\*.{py} -rni '/home/ga/projects' -e 'def Pr(x,u,v)'

I'm using GNU grep version 2.16.

Paulo Mattos
  • 18,845
  • 10
  • 77
  • 85
Geoffrey Anderson
  • 1,534
  • 17
  • 25
  • 3
    What do you think the braces around `py` are doing? – chepner May 22 '17 at 18:09
  • 1
    A list containing 1 or more items has been a useful artifact in my past work here. Depending on the day and the problem, I have 1 or more file extensions to search through, e.g., .py today, .py and .ipynb tomorrow, .R some other day, and then .R and .Rmd and .ipynb on the 4th day because jupyter now supports R code but the ipynb extension is still there despite, which is a legacy thing due to the genesis of jupyter being in the ipython codebase originally but not any more. – Geoffrey Anderson May 23 '17 at 16:23
  • Brace expansion doesn't support 1-item lists. Those braces are being passed to `grep` literally. – chepner May 23 '17 at 17:09

3 Answers3

3

--include=\*.{py} looks like a broken attempt to use brace expansion (an unquoted {...} expression).

However, for brace expansion to occur in bash (and ksh and zsh), you must either have:

  • a list of at least 2 items, separated with ,; e.g. {py,txt}, which expands to 2 arguments, py and txt.

  • or, a range of items formed from two end points, separated with ..; e.g., {1..3}, which expands to 3 arguments, 1, 2, and 3.

Thus, with a single item, simply do not use brace expansion:

--include=\*.py

If you did have multiple extensions to consider, e.g., *.py as well as *.pyc files, here's a robust form that illustrates the underlying shell features:

'--include=*.'{py,pyc}

Here:

  • Brace expansion is applied, because {...} contains a 2-item list.
  • Since the {...} directly follows the literal (single-quoted) string --include=*., the results of the brace expansion include the literal part.
  • Therefore, 2 arguments are ultimately passed to grep, with the following literal content:
    • --include=*.py
    • --include=*.pyc
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    @PauloMattos: Thanks, though I wouldn't necessarily call it a _hack_ - it's a regular application of brace expansion. – mklement0 May 22 '17 at 19:09
  • 1
    Agreed. I discovered the command finally worked correctly, before checking back at this website, when I removed the braces, when there is only a single item to search for. – Geoffrey Anderson May 23 '17 at 16:18
2

Your command fails because of the braces '{}'. It will search for it in the file name. You can create a file such as 'myscript.{py}' to convince yourself. You'll see it will appear in the results.

The correct option parameter would be '*.py' or the equivalent \*.py. Either way will protect it from being (mis)interpreted by the shell.

On the other side, I can only advise to use the command find for such jobs :

find /home/ga/projects -regex '.*\.py$' -exec grep -e "def Pr(x,u,v)" {} +

That will protect you from hard to understand shell behaviour.

Adrien H
  • 643
  • 6
  • 21
  • 3
    Using a backslash is equivalent to using quotes — it protects the string from misinterpretation. That said, `--include=*.py` (no quotes, no backslashes) would work unless you have a file called, for example, `--include=abc.py` in your directory, which is pretty implausible (or you set your shell options inappropriately). – Jonathan Leffler May 22 '17 at 18:33
  • 3
    If you're going to use `find`, use `+` in place of `\;` so that `find` runs `grep` as infrequently as possible (rather than once per matching file name which the semicolon requires). – Jonathan Leffler May 22 '17 at 18:34
  • 1
    Anything involving finding files is not "pure grep", it's an ugly hack provided by the GNU guys for reasons beyond mortal ken. `grep` was created to `Globally find a Regular Expression in a file and Print the result`. There already was (and still is) a perfectly good tool for `find`ing files before the GNU guys muddied the waters. – Ed Morton May 22 '17 at 20:59
1

Try like this (using quotes to be safe; also better readability than backslash escaping IMHO):

grep --include='*.py' ...

your \*.{py} brace expansion usage isn't supported at all by grep. Please see the comments below for the full investigation regarding this. For the record, blame this answer for the resulting brace wars ;)

By the way, the brace expansion works generally fine in Bash. See mklement0 answer for more details.


Ack. As an alternative, you might consider switching to ack instead from now on. It's a tool just like grep, but fully optimized for programmers.

It's a great fit for what you are doing. A nice quote about it:

Every once in a while something comes along that improves an idea so much, you can't ignore it. Such a thing is ack, the grep replacement.

Paulo Mattos
  • 18,845
  • 10
  • 77
  • 85
  • 1
    Do the braces really only work when there's at least one comma? – Jonathan Leffler May 22 '17 at 18:14
  • 1
    @JonathanLeffler I was surprised too :( But I tested on macOS and it really matters... – Paulo Mattos May 22 '17 at 18:24
  • 1
    ...and I'm seeing this same behavior in pure Bash *expansion* as well (e.g., `echo *.py` not the same as `echo *.{py}`; the latter just outputs `*.{py}`.) – Paulo Mattos May 22 '17 at 18:29
  • Curious — it does indeed matter. Bash (3.2.57 — distributed by default on macOS Sierra 10.12.5; also Bash 4.3.42) requires a comma too, much to my surprise. At least `grep` is consistent with Bash. – Jonathan Leffler May 22 '17 at 18:29
  • 1
    I don't think `grep` understands the `{}` syntax at all. `grep --include='*.{py,html,php}'` will match only files that _literally_ have extension `.{py,html,php}`. Without quoting, only Bash comes into play, and, indeed, the list form of brace expansion requires a comma. – mklement0 May 22 '17 at 18:34
  • @JonathanLeffler Running **Bash 3.2.57(1)-release** here... man, that pretty much summarizes my *love and hate* relationship with Bash :-( – Paulo Mattos May 22 '17 at 18:35
  • 2
    @mklement0: You seem to be correct — the braces work with commas in Bash but not in `grep`, and the braces don't work as you'd expect in Bash when there isn't at least one comma (or a range expression). A curious piece of non-generality, but [brace expansion](https://www.gnu.org/software/bash/manual/bash.html#Brace-Expansion) is pretty weird/limited anyway. (Am I allowed to add 'but it comes from C shell, which is pretty weird/limited too'?) – Jonathan Leffler May 22 '17 at 18:37
  • @mklement0 I think you are right man! I'm running more tests here, on macOS Sierra, and indeed the `'*.{py,html,php}'` isn't working at all with `grep`. – Paulo Mattos May 22 '17 at 18:37
  • 1
    Interesting about macOS vs other Unix. I wonder if it depends on whether the POSIX-based library function [`fnmatch()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/fnmatch.html) supports the notation, or whether `grep` builds it into its own code. Or whether the other answer was produced under the influence — I just dug out a Centos 7 VM and `grep --include='*.{c,h}' -rlw -e include .` printed nothing, but not using the braces (once, `.c` only; twice, `.c` and `.h`) worked. – Jonathan Leffler May 22 '17 at 18:45
  • @JonathanLeffler ...as someone once noted: *"The nice thing about standards is that you have so many to choose from."* :-) – Paulo Mattos May 22 '17 at 18:50
  • @JonathanLeffler: Re weirdness: while the single-item issue is debatable, it's the inability to use _variable references_ to drive a range expansion that makes range brace expansions awkward. Only `bash` has this limitation, however - both `ksh` and `zsh` do support variables in that scenario. – mklement0 May 22 '17 at 19:14
  • 1
    @mklement0: It's messy, but you're right. `printf '%s\n' /opt/gnu/bin/grep -r -l -w -e include --include=*.{c,h} .` generates output including `--include=*.c` and `--include=*.h`; file name generation at work. With `--include=*.{c}`, the output includes `--include=*.{c}`; file name generation not working. Having created `> '--include=abc.h'`, the first variant generates `--include=*.c` and `--include=abc.h` — this drives beginners to despair (and old hands to the bar). Does `tcsh` allow variables in brace expansion ranges? No: `set x=5 y=9` and `echo abc.{$x..$y}` yields `abc.5..9`. – Jonathan Leffler May 22 '17 at 19:23
  • @JonathanLeffler: Thanks for that. I've just posted an [answer](https://stackoverflow.com/a/44121121/45375) to the linked question that tries to cover these subtleties. – mklement0 May 22 '17 at 20:02