2

This question pertains to the usage and inner workings of grep. If I issue grep 'needle' *, grep will search for needle within all text files of the current directory (e.g., http://www.linfo.org/grep.html). But what constitutes these so-called "text files", and how does grep identify these files?

For example, ASCII, UTF-8, UTF-16, could all be considered to be text files, but grep does not search UTF-16 by default.

As for identification, does grep use solely the file signature at the beginning of the file?

flow2k
  • 3,999
  • 40
  • 55
  • 1
    I'm searching this for the word "binary_files": https://git.savannah.gnu.org/cgit/grep.git/tree/src/grep.c. It's a lot to go through. I *think* what you're looking for is around line 1483. Symbols `nlines_first_null`, `buf_has_nulls`, and `file_must_have_nulls` may be of interest. –  Aug 06 '17 at 22:24
  • I don't know the answer, but I like @Amy's approach of looking at the code as a source of truth. Although I would hope something like this (especially the question on which file encodings are included) would be included in the documentation. – user4624937 Aug 06 '17 at 22:28
  • 1
    @user4624937 you know what they say: code doesn't lie. –  Aug 07 '17 at 02:54
  • 1
    While grep's determination is likely accurate enough in many cases, only you can say that a file is text or not, and if so, what the encoding is. If you do want to grep a UTF-16 encoded text file, you can [convert it to UTF-8](https://stackoverflow.com/a/3781221/2226988) first and pipe it to grep. – Tom Blodget Aug 07 '17 at 17:13

0 Answers0