122

After a few searches from Google, what I come up with is:

find my_folder -type f -exec grep -l "needle text" {} \; -exec file {} \; | grep text

which is very unhandy and outputs unneeded texts such as mime type information. Any better solutions? I have lots of images and other binary files in the same folder with a lot of text files that I need to search through.

Shai
  • 111,146
  • 38
  • 238
  • 371
datasn.io
  • 12,564
  • 28
  • 113
  • 154

17 Answers17

215

I know this is an old thread, but I stumbled across it and thought I'd share my method which I have found to be a very fast way to use find to find only non-binary files:

find . -type f -exec grep -Iq . {} \; -print

The -I option to grep tells it to immediately ignore binary files and the . option along with the -q will make it immediately match text files so it goes very fast. You can change the -print to a -print0 for piping into an xargs -0 or something if you are concerned about spaces (thanks for the tip, @lucas.werkmeister!)

Also the first dot is only necessary for certain BSD versions of find such as on OS X, but it doesn't hurt anything just having it there all the time if you want to put this in an alias or something.

EDIT: As @ruslan correctly pointed out, the -and can be omitted since it is implied.

Cassie Dee
  • 3,145
  • 2
  • 18
  • 11
  • 19
    On Mac OS X, I need to change this to `find . -type f -exec grep -Il "" {} \;`. – Alec Jacobson Jan 05 '14 at 22:49
  • 3
    This is better than peoro's answer because 1. it actually answers the question 2. It does not yield false positives 3. it is way more performant – user123444555621 Mar 29 '14 at 09:25
  • 6
    You can also use `find -type f -exec grep -Iq . {} \; -and -print` which has the advantage that it keeps the files in `find`; you can substitute `-print` with another `-exec` that is only run for text files. (If you let `grep` print the file names, you won’t be able to distinguish file names with newlines in them.) – Lucas Werkmeister Jul 22 '15 at 11:11
  • @lucas.werkmeister Good idea! – Cassie Dee Jul 22 '15 at 16:57
  • Unfortunately, if you have a large text file with very long lines, this will choke `grep` and consume a lot of memory. – Nathan S. Watson-Haigh Jun 28 '17 at 03:27
  • 1
    @NathanS.Watson-Haigh It shouldn't, because it should be matching text files immediately. Do you have a specific use case you can share? – Cassie Dee Jun 28 '17 at 17:37
  • I can assume you @histumness that I do have a file on which this happens. Whether it's a "valid" file is another question as I think from memory it may also not have had an EOL character! :) – Nathan S. Watson-Haigh Jul 04 '17 at 21:42
  • I didn't know you could use the result of an exec statement in a conditional. Very nice. – otocan Aug 03 '17 at 09:35
  • 4
    `find . -type f -exec grep -Il . {} +` is much faster. Drawback is that it cannot be extended by another `-exec` as @lucas.werkmeister suggested – Henning Mar 08 '18 at 10:07
  • @Henning Oh good call. I didn't know about the `+` option to `find`. Very nice. – Cassie Dee Mar 10 '18 at 15:08
  • `find . -type f -exec grep -Il . {} +` brought binaries to me. Thanks! – Eduardo Lucio Oct 15 '18 at 15:25
  • 1
    `-and` can be omitted here, since it's the default operator (e.g. you didn't use it between `-type` and `-exec`). – Ruslan Feb 15 '19 at 19:32
  • 1
    @Henning I did some tests with `time` in a folder with ~200000 text files and many other files. (My ~/Downloads folder.) I did not see any obvious non-text files with either method, but did not look too closely. `find . -type f -exec grep -Il . {} +` took 42.3 seconds. `find . -type f -exec grep -Iq . {} \; -print` took 22.7 **minutes**. So I strongly recommend the first where possible. Running it again after the previous two commands took less than 10 seconds. – adamf Jan 11 '20 at 02:12
  • This command did not work for me. I've got large amount of raw binary images (custom format: one byte - one pixel, no compression, no headers or other embedded metainformation), many of them were listed. – wl2776 Mar 05 '20 at 08:14
  • What about if you're not using `grep`? – Aaron Franke Mar 19 '20 at 20:59
  • This is awesome. Thank you. Performance suggestion: all the forms of find that use -exec will fork a grep process for every file, which is slow. Following will run much faster: `find . -type f -print0 | xargs -0 grep -Iq ""` – Marcin K Apr 27 '21 at 14:27
  • This variant doesn't print the files with permission errors: `find . -type f 2>&- -exec grep -Iq . {} \; -print` – user598527 Jun 15 '23 at 08:43
11

Based on this SO question :

grep -rIl "needle text" my_folder

Community
  • 1
  • 1
crayzeewulf
  • 5,840
  • 1
  • 27
  • 30
10

Why is it unhandy? If you need to use it often, and don't want to type it every time just define a bash function for it:

function findTextInAsciiFiles {
    # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
    find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text
}

put it in your .bashrc and then just run:

findTextInAsciiFiles your_folder "needle text"

whenever you want.


EDIT to reflect OP's edit:

if you want to cut out mime informations you could just add a further stage to the pipeline that filters out mime informations. This should do the trick, by taking only what comes before :: cut -d':' -f1:

function findTextInAsciiFiles {
    # usage: findTextInAsciiFiles DIRECTORY NEEDLE_TEXT
    find "$1" -type f -exec grep -l "$2" {} \; -exec file {} \; | grep text | cut -d ':' -f1
}
peoro
  • 25,562
  • 20
  • 98
  • 150
  • I'm not sure if "grep text" is accurate enough to get exactly all text files - I mean, is there any text file types that have no 'text' in the string of its mime type description? – datasn.io Jan 22 '11 at 11:58
  • @kavoir.com: yes. From `file` manual: "Users depend on knowing that all the readable files in a directory have the word ‘text’ printed." – peoro Jan 22 '11 at 12:13
  • 2
    Wouldn't it be a bit more clever to search for text files before grepping, instead of grepping and then filtering out text files? – user unknown Mar 17 '12 at 16:13
  • `/proc/meminfo`, `/proc/cpuinfo` etc. are text files, but `file /proc/meminfo` says `/proc/meminfo: empty`. I wonder if 'empty' should be tested in addition to 'text', but not sure if also other types could report 'empty'. – Timo Kähkönen Mar 08 '13 at 00:26
  • "Why is it unhandy?" - "outputs unneeded texts". This answer does not sove that. – user123444555621 Mar 28 '14 at 08:46
  • @Pumbaa80: uh? OP added "and outputs unneeded texts such as mime type information" later, modifying the question, and this answer was modified as well adding `cut` to the pipeline in order to only get the data OP's interested about. I'm not getting your point... – peoro Mar 29 '14 at 17:04
  • Option -q should be added to the first grep to avoid "unneeded texts" – Henning Mar 08 '18 at 09:49
5
find . -type f -print0 | xargs -0 file | grep -P text | cut -d: -f1 | xargs grep -Pil "search"

This is unfortunately not space save. Putting this into bash script makes it a bit easier.

This is space safe:

#!/bin/bash
#if [ ! "$1" ] ; then
    echo "Usage: $0 <search>";
    exit
fi

find . -type f -print0 \
  | xargs -0 file \
  | grep -P text \
  | cut -d: -f1 \
  | xargs -i% grep -Pil "$1" "%"
user unknown
  • 35,537
  • 11
  • 75
  • 121
Antti Rytsölä
  • 1,485
  • 14
  • 24
  • 2
    There are a couple of issues in your script: 1. what if a binary file is named `text.bin`? 2. What if a filename contains a `:`? – thkala Jan 22 '11 at 11:53
4

Another way of doing this:

# find . |xargs file {} \; |grep "ASCII text"

If you want empty files too:

#  find . |xargs file {} \; |egrep "ASCII text|empty"
The IT Guy
  • 86
  • 5
2

I have two issues with histumness' answer:

  • It only list text files. It does not actually search them as requested. To actually search, use

    find . -type f -exec grep -Iq . {} \; -and -print0 | xargs -0 grep "needle text"
    
  • It spawns a grep process for every file, which is very slow. A better solution is then

    find . -type f -print0 | xargs -0 grep -IZl . | xargs -0 grep "needle text"
    

    or simply

    find . -type f -print0 | xargs -0 grep -I "needle text"
    

    This only takes 0.2s compared to 4s for solution above (2.5GB data / 7700 files), i.e. 20x faster.

Also, nobody cited ag, the Silver Searcher or ack-grep¸as alternatives. If one of these are available, they are much better alternatives:

ag -t "needle text"    # Much faster than ack
ack -t "needle text"   # or ack-grep

As a last note, beware of false positives (binary files taken as text files). I already had false positive using either grep/ag/ack, so better list the matched files first before editing the files.

Fuujuhi
  • 313
  • 3
  • 8
fuujuhi
  • 175
  • 2
  • 9
2

Here's a simplified version with extended explanation for beginners like me who are trying to learn how to put more than one command in one line.

If you were to write out the problem in steps, it would look like this:

// For every file in this directory
// Check the filetype
// If it's an ASCII file, then print out the filename

To achieve this, we can use three UNIX commands: find, file, and grep.

find will check every file in the directory.

file will give us the filetype. In our case, we're looking for a return of 'ASCII text'

grep will look for the keyword 'ASCII' in the output from file

So how can we string these together in a single line? There are multiple ways to do it, but I find that doing it in order of our pseudo-code makes the most sense (especially to a beginner like me).

find ./ -exec file {} ";" | grep 'ASCII'

Looks complicated, but not bad when we break it down:

find ./ = look through every file in this directory. The find command prints out the filename of any file that matches the 'expression', or whatever comes after the path, which in our case is the current directory or ./

The most important thing to understand is that everything after that first bit is going to be evaluated as either True or False. If True, the file name will get printed out. If not, then the command moves on.

-exec = this flag is an option within the find command that allows us to use the result of some other command as the search expression. It's like calling a function within a function.

file {} = the command being called inside of find. The file command returns a string that tells you the filetype of a file. Regularly, it would look like this: file mytextfile.txt. In our case, we want it to use whatever file is being looked at by the find command, so we put in the curly braces {} to act as an empty variable, or parameter. In other words, we're just asking for the system to output a string for every file in the directory.

";" = this is required by find and is the punctuation mark at the end of our -exec command. See the manual for 'find' for more explanation if you need it by running man find.

| grep 'ASCII' = | is a pipe. Pipe take the output of whatever is on the left and uses it as input to whatever is on the right. It takes the output of the find command (a string that is the filetype of a single file) and tests it to see if it contains the string 'ASCII'. If it does, it returns true.

NOW, the expression to the right of find ./ will return true when the grep command returns true. Voila.

mepler
  • 907
  • 8
  • 12
2

How about this:

$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable'

If you want the filenames without the file types, just add a final sed filter.

$ grep -rl "needle text" my_folder | tr '\n' '\0' | xargs -r -0 file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'

You can filter-out unneeded file types by adding more -e 'type' options to the last grep command.

EDIT:

If your xargs version supports the -d option, the commands above become simpler:

$ grep -rl "needle text" my_folder | xargs -d '\n' -r file | grep -e ':[^:]*text[^:]*$' | grep -v -e 'executable' | sed 's|:[^:]*$||'
thkala
  • 84,049
  • 23
  • 157
  • 201
  • silly me. Didn't notice recursive grep. as I understood it's actually quite fast even though a bit limited in many applications. +1 for you. – Antti Rytsölä Jan 22 '11 at 11:49
2

Here's how I've done it ...

1 . make a small script to test if a file is plain text istext:

#!/bin/bash
[[ "$(file -bi $1)" == *"file"* ]]

2 . use find as before

find . -type f -exec istext {} \; -exec grep -nHi mystring {} \;
1

Although it is an old question, I think this info bellow will add to the quality of the answers here.

When ignoring files with the executable bit set, I just use this command:

find . ! -perm -111

To keep it from recursively enter into other directories:

find . -maxdepth 1 ! -perm -111

No need for pipes to mix lots of commands, just the powerful plain find command.

  • Disclaimer: it is not exactly what OP asked, because it doesn't check if the file is binary or not. It will, for example, filter out bash script files, that are text themselves but have the executable bit set.

That said, I hope this is useful to anyone.

DrBeco
  • 11,237
  • 9
  • 59
  • 76
0

I do it this way: 1) since there're too many files (~30k) to search thru, I generate the text file list daily for use via crontab using below command:

find /to/src/folder -type f -exec file {} \; | grep text | cut -d: -f1 > ~/.src_list &

2) create a function in .bashrc:

findex() {
    cat ~/.src_list | xargs grep "$*" 2>/dev/null
}

Then I can use below command to do the search:

findex "needle text"

HTH:)

Frank Fang
  • 151
  • 2
  • 7
0

I prefer xargs

find . -type f | xargs grep -I "needle text"

if your filenames are weird look up using the -0 options:

find . -type f -print0 | xargs -0 grep -I "needle text"
dalore
  • 5,594
  • 1
  • 36
  • 38
0
  • bash example to serach text "eth0" in /etc in all text/ascii files

grep eth0 $(find /etc/ -type f -exec file {} \; | egrep -i "text|ascii" | cut -d ':' -f1)

Gabriel G
  • 21
  • 2
0

If you are interested in finding any file type by their magic bytes using the awesome file utility combined with power of find, this can come in handy:

$ # Let's make some test files
$ mkdir ASCII-finder
$ cd ASCII-finder
$ dd if=/dev/urandom of=binary.file bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.009023 s, 116 MB/s
$ file binary.file
binary.file: data
$ echo 123 > text.txt
$ # Let the magic begin
$ find -type f -print0 | \
    xargs -0 -I @@ bash -c 'file "$@" | grep ASCII &>/dev/null && echo "file is ASCII: $@"' -- @@

Output:

file is ASCII: ./text.txt

Legend: $ is the interactive shell prompt where we enter our commands

You can modify the part after && to call some other script or do some other stuff inline as well, i.e. if that file contains given string, cat the entire file or look for a secondary string in it.

Explanation:

  • find items that are files
  • Make xargs feed each item as a line into one liner bash command/script
  • file checks type of file by magic byte, grep checks if ASCII exists, if so, then after && your next command executes.
  • find prints results null separated, this is good to escape filenames with spaces and meta-characters in it.
  • xargs , using -0 option, reads them null separated, -I @@ takes each record and uses as positional parameter/args to bash script.
  • -- for bash ensures whatever comes after it is an argument even if it starts with - like -c which could otherwise be interpreted as bash option

If you need to find types other than ASCII, simply replace grep ASCII with other type, like grep "PDF document, version 1.4"

0
find . -type f | xargs file | grep "ASCII text" | awk -F: '{print $1}'

Use find command to list all files, use file command to verify they are text (not tar,key), finally use awk command to filter and print the result.

Roy Zeng
  • 511
  • 7
  • 10
  • Not all text file contains only ASCII characters, Unicode characters are possible, e.g. `file` report `UTF8 text` for those text files. – Michael Lee Oct 13 '22 at 07:56
0
grep --recursive --binary-files=without-match --files-with-matches --no-messages . | xargs -d '\n' realpath

Has worked satisfyingly thus far — I'm piping the grep results to realpath in order to receive the absolute paths. xargs -d '\n' handles potential spaces in filenames and paths.

Replace . with the desired search path when necessary.

user598527
  • 175
  • 13
-4

How about this

 find . -type f|xargs grep "needle text"
Navi
  • 8,580
  • 4
  • 34
  • 32