127

When I want to grep all the html files in some directory, I do the following

grep --include="*.html" pattern -R /some/path

which works well. The problem is how to grep all the html,htm,php files in some directory?

From this Use grep --exclude/--include syntax to not grep through certain files, it seems that I can do the following

grep --include="*.{html,php,htm}" pattern -R /some/path

But sadly, it would not work for me.
FYI, my grep version is 2.5.1.

Community
  • 1
  • 1
tianyapiaozi
  • 1,928
  • 2
  • 15
  • 19

7 Answers7

174

You can use multiple --include flags. This works for me:

grep -r --include=*.html --include=*.php --include=*.htm "pattern" /some/path/

However, you can do as Deruijter suggested. This works for me:

grep -r --include=*.{html,php,htm} "pattern" /some/path/

Don't forget that you can use find and xargs for this sort of thing too:

find /some/path/ -name "*.htm*" -or -name "*.php" | xargs grep "pattern"
BSMP
  • 4,596
  • 8
  • 33
  • 44
Steve
  • 51,466
  • 13
  • 89
  • 103
  • 2
    I see the problem. I used --include="*.{html,php}" to prevent shell from expanding '*' which at the same time stop shell to expand {html,php}. It seems that equal sign in --include=* is able to prevent shell from expanding '*'. – tianyapiaozi May 17 '12 at 04:53
  • xargs isn't really a substitute; lots of times when you need this feature, you're dealing with more files than xargs will handle. – James Moore Aug 14 '14 at 17:54
  • 2
    @JamesMoore: Take a look at [GNU Parallel](https://www.gnu.org/software/parallel/). It can often be used as a substitute for `xargs`. [This](https://www.gnu.org/software/parallel/man.html#differences_between_xargs_and_gnu_parallel) is also worth a quick read. HTH. – Steve Aug 14 '14 at 23:52
  • 3
    @tianyapiaozi: You are correct that the quoting around the brace expansion is the problem; without the quoting, however, `*` is still subject to globbing _as part of the token it is embedded in_, it just _happens_ not to match anything in this case, because only files _literally_ named something like `--include=foo.html` would match. To be safe, quote the `*` (which can you do individually with `\*`). As an added bonus this makes it _visually_ clearer that is _not the shell_ that should perform the globbing in this case. – mklement0 May 22 '17 at 20:30
  • 2
    As for the `find` solution: using `-exec grep "pattern" {} +` instead of `| xargs grep "pattern"` is more robust (handles filenames with spaces, for instance) as well as more efficient. – mklement0 May 22 '17 at 20:33
  • what @mklement0 wrote is true, especially because find with -exec is aware of the maximum command-line length accepted by xargs and will chunk the file names and keep the xargs command length legal, while using a pipe can exceed the maximum command line length – simpleuser Nov 29 '17 at 17:00
  • @simpleuser: To be clear: you'd use `-exec` _instead_ of `xargs` - `find -exec` has `xargs` logic _built in_ (perhaps that's what you meant). As for pipes: as long as the arguments flow via stdin as opposed to being passed on the command line (whether with or without a pipeline), there shouldn't be a problem (due to `xargs`' intelligent behind-the-scenes argument chunking). – mklement0 Nov 29 '17 at 17:32
45

tl;dr

# Works in bash, ksh, and zsh.
grep -R '--include=*.'{html,php,htm} pattern /some/path

Using {html,php,htm} can only work as a brace expansion, which is a nonstandard (not POSIX-compliant) feature of bash, ksh, and zsh.

  • In other words: do not try to use it in a script that targets /bin/sh - use explicit multiple --include arguments in that case.

  • grep itself does not understand {...} notation.

For a brace expansion to be recognized, it must be an unquoted (part of a) token on the command line.

A brace expansion expands to multiple arguments, so in the case at hand grep ends up seeing multiple --include=... options, just as if you had passed them individually.

The results of a brace expansion are subject to globbing (filename expansion), which has pitfalls:

  • Each resulting argument could further be expanded to matching filenames if it happens to contain unquoted globbing metacharacters such as *.
    While this is unlikely with tokens such as --include=*.html (e.g., you'd have to have a file literally named something like --include=foo.html for something to match), it is worth keeping in mind in general.

  • If the nullglob shell option happens to be turned on (shopt -s nullglob) and globbing matches nothing, the argument will be discarded.

Therefore, for a fully robust solution, use the following:

grep -R '--include=*.'{html,php,htm} pattern /some/path
  • '--include=*.' is treated as a literal, due to being single-quoted; this prevents inadvertent interpretation of * as a globbing character.

  • {html,php,htm}, the - of necessity - unquoted brace expansion[1] , expands to 3 arguments, which, due to {...} directly following the '...' token, include that token.

  • Therefore, after quote removal by the shell, the following 3 literal arguments are ultimately passed to grep:

    • --include=*.html
    • --include=*.php
    • --include=*.htm

[1] More accurately, it's only the syntax-relevant parts of the brace expansion that must be unquoted, the list elements may still be individually quoted and must be if they contain globbing metacharacters that could result in unwanted globbing after the brace expansion; while not necessary in this case, the above could be written as
'--include=*.'{'html','php','htm'}

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    Thank you very much for this post. Great posts not only answer the question but teach you something new! This is especially useful for those of us writing on something that needs to be POSIX compliant. Anybody using Mac OS X should look here! – sabalaba Jun 03 '17 at 07:36
  • @sabalaba: I'm glad to hear it, but to be clear: while brace expansion is not POSIX-compliant, it works with `bash` on any platform that `bash` runs on. – mklement0 Jun 04 '17 at 03:23
10

Try removing the double quotes

grep --include=*.{html,php,htm} pattern -R /some/path
Deruijter
  • 2,077
  • 16
  • 27
5

It works for the same purpose, but without --include option. It works on grep 2.5.1 as well.

grep -v -E ".*\.(html|htm|php)"
Kohei Mikami
  • 2,850
  • 24
  • 21
4

is this not working?

  grep pattern  /some/path/*.{html,php,htm} 
Vijay
  • 65,327
  • 90
  • 227
  • 319
2

Try this. -r will do a recursive search. -s will suppress file not found errors. -n will show you the line number of the file where the pattern is found.

    grep "pattern" <path> -r -s -n --include=*.{c,cpp,C,h}
Pradeep
  • 41
  • 6
  • This is the best answer for me particularly, and I think you can put -rsn instead of -r -s -n (but that's nitpicking). – slim Aug 19 '16 at 14:40
  • Usually I use **-rns**. For clarity in the example I had to mention **-r -n -s** :-) Glad that it helped. – Pradeep Aug 20 '16 at 20:59
  • I recommend adding `-I` to the standard set. It skips binary files (which are hardly ever searched) hence boosts efficiency. Then we go `grep -rIns ...` which acousticly plays nicely:) – bloody May 01 '20 at 20:36
  • This does search through every file, not just the ones that match the expression regex. It's accurate, but not efficient when you know the extension, or another way to identify the file by name. – Wexxor Nov 06 '20 at 22:30
0

Use grep with find command

find /some/path -name '*.html' -o -name '*.htm' -o -name '*.php' -type f 
 -exec grep PATTERN {} \+

You can use -regex and -regextype options too.

Prince John Wesley
  • 62,492
  • 12
  • 87
  • 94