4

We have a rather large and complex file system and I am trying to generate a list of files containing a particular text string. This should be simple, but I need to exclude the './svn' and './pdv' directories (and probably others) and to only look at files of type *.p, *.w or .i.

I can easily do this with a program, but it is proving very slow to run. I want to speed up the process (so that I'm not searching thousands of files repeatedly) as I need to run such searches against a long list of criteria.

Normally, we search the file system using:

find . -name "*.[!r]*" -exec grep -i -l "search for me" {} \;

This is working, but I'm then having to use a program to exclude the unwanted directories , so it is running very slowly.

After looking at the topics here: Stack Overflow thread

I've decided to try a few other aproaches:

grep -ilR "search for me" . --exclude ".svn" --excluse "pdv" --exclude "!.{p,w,i*}" 

Excludes the './svn', but not the './pdv' directories, Doesn't limit the files looked at.

grep -ilR "search for me" . --exclude ".svn" --excluse "pdv" --include "*.p" 

Excludes the './svn', but not the './pdv' directories, Doesn't limit the files looked at.

find . -name "*.[!r]*" -exec grep -i -l ".svn" | grep -i -l "search for me" {} \;

I can't even get this (or variations on it) to run successfully.

find . ! -name "*.svn*" -prune -print -exec grep -i -l "search for me" {} \;

Doesn't return anything. It looks like it stops as soon as it finds the .svn directory.

Community
  • 1
  • 1
Colin
  • 1,141
  • 1
  • 9
  • 9
  • Explicitly, I'm looking for all files that match any of "*.p", "*.w" or "*.i*" and excluding the directories called ".svn" and "pdv". Many thanks – Colin Aug 18 '11 at 15:10
  • 2
    In your examples are `--excluse "pdv"` (note the typo s/d) in both cases and you are complaining about that particular condition not working... just checking typo is not the main problem. – geronime Aug 18 '11 at 21:05
  • 1
    Well, I think that "D'Oh!" is a good start. Thanks for spotting that. – Colin Aug 19 '11 at 08:18
  • @geronime, I just tried that example with the typo fixed (I hope). The search string is `grep -ilR "run" . --exclude ".svn" --exclude "pdv" --exclude "!.{p,w,i*}"`. Unfortunately as the results set now includes both `.svn/text-base/jr83144.p.svn-base` and `pdv/cm/backupds.i` I don't think that this has worked. Many thanks – Colin Aug 19 '11 at 08:26
  • have you rather tried `--exclude-dir` parameter? That is actually the problem I think. Refer to the manual of `grep`. – geronime Aug 19 '11 at 08:56
  • To exclude directories: `find . \( \( -name .svn -o -name pdv \) -type d -prune \) -o \( -name '*.[pwi]' -exec grep ... {} + \)` – Adrian Pronk Aug 19 '11 at 09:22
  • @geronime, I've tried `--exclude-dir`, but it didn't appear to register. Neither it, nor `--exclude` are listed in teh manual. – Colin Aug 19 '11 at 09:36
  • Just wanted to say thanks to all of you for your help. Even where an answer's not worked, it's giving me a good insite to how grep and find work. – Colin Aug 19 '11 at 10:06
  • @Colin: I see, you were not talking about linux. The `--exclude` and `--exclude-dir` are advanced options of `grep` implemented on linux. – geronime Aug 19 '11 at 14:04

4 Answers4

2

Following command finds only *.rb files containing require 'bundler/setup' line and excludes search in .git and .bundle directories. That is the same use case I think.

grep -ril --exclude-dir .git --exclude-dir .bundle \
  --include \*.rb "^require 'bundler/setup'$" .

The problem was with swapping of --exclude and --exclude-dir parameters I believe. Refer to the grep(1) manual.

Also note that exclude/include parameters accept GLOB only, not regexps, therefore single character suffix range can be done with one --include parameter, but more complex conditions would require more of the parameters:

--include \*.[pwi] --include \*.multichar_sfx ...
geronime
  • 573
  • 4
  • 15
2

How about something like:

find . \( \( -name .svn -o -name pdv \) -type d -prune \) -o \( -name '*.[pwi]' -type f -exec grep -i -l "search for me" {} + \)

This will:
- ignore the contents of directories named .svn and pdv
- grep files (and symlinks to files) named *.[pwi]

The + option after exec means gather as many files into a single command as will fit on the command line (roughly 1 million chars in Linux). This can seriously speed up processing if you have to iterate over thousands of files.

Adrian Pronk
  • 13,486
  • 7
  • 36
  • 60
0

You can try the following:

find path_starting_point -type f | grep regex_to_filter_file_names | xargs grep regex_to_find_inside_matched_files
Heisenbug
  • 38,762
  • 28
  • 132
  • 190
  • I've tried this, but I can't the the "regex_to_filter_file_names" to work properly. I've tried `find . -type f | grep .*\.p | xargs -il grep "run"`, but it's returning files ending .ixx as well as ending .p. – Colin Aug 18 '11 at 15:08
  • maybe `grep '\.\[pwi\]$'` to match files ending with "_.p_", "_.w_" or "_.i_" suffix – geronime Aug 19 '11 at 08:50
0
find . -name "filename_regex"|grep -v '.svn' -v '.pdv'|xargs grep -i 'your search string'
Vijay
  • 65,327
  • 90
  • 227
  • 319
  • I've also given this a try, but again can't get the regex to work. Just using a simple example of `find . -name ".*\.i"` to try out the concept isn't returning any values. – Colin Aug 18 '11 at 15:10