How does `grep` determine which files are "text"?

Question

This question pertains to the usage and inner workings of grep. If I issue grep 'needle' *, grep will search for needle within all text files of the current directory (e.g., http://www.linfo.org/grep.html). But what constitutes these so-called "text files", and how does grep identify these files?

For example, ASCII, UTF-8, UTF-16, could all be considered to be text files, but grep does not search UTF-16 by default.

As for identification, does grep use solely the file signature at the beginning of the file?

I'm searching this for the word "binary_files": https://git.savannah.gnu.org/cgit/grep.git/tree/src/grep.c. It's a lot to go through. I *think* what you're looking for is around line 1483. Symbols `nlines_first_null`, `buf_has_nulls`, and `file_must_have_nulls` may be of interest. — , Aug 06 '17 at 22:24
I don't know the answer, but I like @Amy's approach of looking at the code as a source of truth. Although I would hope something like this (especially the question on which file encodings are included) would be included in the documentation. — user4624937, Aug 06 '17 at 22:28
While grep's determination is likely accurate enough in many cases, only you can say that a file is text or not, and if so, what the encoding is. If you do want to grep a UTF-16 encoded text file, you can [convert it to UTF-8](https://stackoverflow.com/a/3781221/2226988) first and pipe it to grep. — Tom Blodget, Aug 07 '17 at 17:13

How does `grep` determine which files are "text"?

0 Answers0