65

How can I know if a file is a binary file?

For example, compiled c file.

I want to read all files from some directory, but I want ignore binary files.

kenorb
  • 155,785
  • 88
  • 678
  • 743
Refael
  • 6,753
  • 9
  • 35
  • 54
  • 10
    Ultimately *all* files are binary. Text files just happen to contain binary representations of human-readable character data. No method for distinguishing text from non-text can be 100% reliable. – Keith Thompson Jun 18 '13 at 22:48
  • [Similar in Vim.](http://vi.stackexchange.com/q/3206/467) – kenorb May 08 '15 at 22:23

13 Answers13

78

Use utility file, sample usage:

 $ file /bin/bash
 /bin/bash: Mach-O universal binary with 2 architectures
 /bin/bash (for architecture x86_64):   Mach-O 64-bit executable x86_64
 /bin/bash (for architecture i386): Mach-O executable i386

 $ file /etc/passwd
 /etc/passwd: ASCII English text

 $ file code.c
 code.c: ASCII c program text

file manual page

Adam Siemion
  • 15,569
  • 7
  • 58
  • 92
  • 15
    Consider using 'file --mine'. For binary files it reports "... charset=binary", so one can simply grep for the regexp "binary$". – 4dan May 21 '14 at 20:00
  • 23
    @4dan - perhaps `--mime`? :) – Bach Jun 19 '14 at 08:49
  • 3
    @4dan Works for me: `file -bL --mime "$path" | grep -q '^text'`. Option `-b` removes the filename from the output, and `-L` dereferences symlinks. – wjandrea Nov 01 '17 at 02:48
  • 1
    1. Does that work on non-x86 architectures? 2. do you consider a pdf file binary? – Victor Eijkhout Dec 06 '17 at 15:00
  • The answer should contain the `--mime` flag as it's otherwise not realistic to match output of `file` for all possible binary formats (such regex would be too long and fragile). – yugr Jul 23 '18 at 19:12
16

Adapted from excluding binary file

find . -exec file {} \; | grep text | cut -d: -f1
gongzhitaao
  • 6,566
  • 3
  • 36
  • 44
  • 1
    This should be `grep text`; historically, `file` didn't always say ASCII, but rather "shell script text" for example. – Jens May 26 '13 at 15:32
  • 1
    @Jens Thanks for reminding. Just check `file` manpage, it should be `text`. – gongzhitaao May 26 '13 at 15:46
  • I just realized that reinvented the wheel once again: for file in `find . -type f -exec file {} \; | grep text | perl -nle 'split /:/;print $_[0]' `; do grep -i --color 'string_to_search' $file ; done ; – Yordan Georgiev Nov 22 '13 at 07:46
  • 1
    Thanks, used and adjusted it to find all binary files in a folder: `find . -type f -exec file {} \; | grep -v text | cut -d: -f1` – Gerrit Nov 05 '14 at 10:04
  • 2
    and what if the filename contains the word "text"? I use grep ".*:.*text" now – Algoman Mar 17 '17 at 12:20
  • If (like me) you find the `\;` especially ugly, note that in many cases `+` (POSIX-compatible) will do as well: `find ... -exec foo {} +` is similar to `find ... | xargs foo`. – Alois Mahdal May 09 '17 at 17:08
  • 1
    @Algoman I use `file -b`, which doesn't output the filename. (Might be a GNU-only feature). – wjandrea Nov 01 '17 at 02:25
14

I use

! grep -qI . "$path"

Only drawback I can see is that it will consider an empty file binary but then again, who decides if that is wrong?

EDIT based on @mgutt's suggestion:

In some contexts the file could be huge so depending on what you need to do it might be safer and sufficient to read only part of the file:

head -c 1024 "$path" | grep -qI .

Keep in mind though, that you will need to choose the size wisely; 1024 bytes of text plus a null byte is still a binary file.

Alois Mahdal
  • 10,763
  • 7
  • 51
  • 69
  • The empty file case can be controlled by adding `|| ! test -s $path`. – yugr Jul 23 '18 at 19:18
  • 2
    Grep for empty string (`''`), not for any single character (`'.'`): **`! fgrep -qI '' "$path"`**. In that way empty files and files consisting only of new-line markers (line feeds) will be treated as textual. – Sasha Jul 07 '19 at 10:36
  • @yugr, that wouldn't really help, because the original Alois Mahdal's code will treat not only absolutely empty files (zero size) as binary, but also files consisting of one or more linefeeds. But that could be easily fixed (see my comment above), Alois Mahdal's idea is great. – Sasha Jul 07 '19 at 10:42
  • 1
    Note: This will read the whole file if it is a text file, for example a 1GB log file will be fully processed. Maybe its better to check only the first 1024 bytes with `head -c 1024 "$path" | grep -qIF ''`?! – mgutt Apr 30 '23 at 23:36
  • @mgutt Nice, although the curse of format detection will still rear its ugly head: *technically* a 10GB of ASCII and one null byte is still a binary file, so figuring out the offset would be essential and context dependent... – Alois Mahdal May 02 '23 at 06:21
5

BSD grep

Here is a simple solution to check for a single file using BSD grep (on macOS/Unix):

grep -q "\x00" file && echo Binary || echo Text

which basically checks if file consist NUL character.

Using this method, to read all non-binary files recursively using find utility you can do:

find . -type f -exec sh -c 'grep -q "\x00" {} || cat {}' ";"

Or even simpler using just grep:

grep -rv "\x00" .

For just current folder, use:

grep -v "\x00" *

Unfortunately the above examples won't work for GNU grep, however there is a workaround.

GNU grep

Since GNU grep is ignoring NULL characters, it's possible to check for other non-ASCII characters like:

$ grep -P "[^\x00-\x7F]" file && echo Binary || echo Text

Note: It won't work for files containing only NULL characters.

kenorb
  • 155,785
  • 88
  • 678
  • 743
4
perl -E 'exit((-B $ARGV[0])?0:1);' file-to-test

Could be used to check whenever "file-to-test" is binary. The above command will exit wit code 0 on binary files, otherwise the exit code would be 1.

The reverse check for text file can look like the following command:

perl -E 'exit((-T $ARGV[0])?0:1);' file-to-test

Likewise the above command will exit with status 0 if the "file-to-test" is text (not binary).

Read more about the -B and -T checks using command perldoc -f -X.

Onlyjob
  • 5,692
  • 2
  • 35
  • 35
3

Use Perl’s built-in -T file test operator, preferably after ascertaining that it is a plain file using the -f file test operator:

$ perl -le 'for (@ARGV) { print if -f && -T }' \
    getwinsz.c a.out /etc/termcap /bin /bin/cat \
    /dev/tty /usr/share/zoneinfo/UTC /etc/motd
getwinsz.c
/etc/termcap
/etc/motd

Here’s the complement of that set:

$ perl -le 'for (@ARGV) { print unless -f && -T }' \
    getwinsz.c a.out /etc/termcap /bin /bin/cat \
    /dev/tty /usr/share/zoneinfo/UTC /etc/motd
a.out
/bin
/bin/cat
/dev/tty
/usr/share/zoneinfo/UTC
tchrist
  • 78,834
  • 30
  • 123
  • 180
3

Going off Bach's suggestion, I think --mime-encoding is the best flag to get something reliable from file.

file --mime-encoding [FILES ...] | grep -v '\bbinary$'

will print the files that file believes have a non-binary encoding. You can pipe this output through cut -d: -f1 to trim the : encoding if you just want the filenames.


Caveat: as @yugr reports below .doc files report an encoding of application/mswordbinary. This looks to me like a bug - the mime type is erroneously being concatenated with the encoding.

$ for flag in --mime --mime-type --mime-encoding; do
    echo "$flag"
    file "$flag" /tmp/example.{doc{,x},png,txt}
  done
--mime
/tmp/example.doc:  application/msword; charset=binary
/tmp/example.docx: application/vnd.openxmlformats-officedocument.wordprocessingml.document; charset=binary
/tmp/example.png:  image/png; charset=binary
/tmp/example.txt:  text/plain; charset=us-ascii
--mime-type
/tmp/example.doc:  application/msword
/tmp/example.docx: application/vnd.openxmlformats-officedocument.wordprocessingml.document
/tmp/example.png:  image/png
/tmp/example.txt:  text/plain
--mime-encoding
/tmp/example.doc:  application/mswordbinary
/tmp/example.docx: binary
/tmp/example.png:  binary
/tmp/example.txt:  us-ascii
dimo414
  • 47,227
  • 18
  • 148
  • 244
  • Plain `--mime` does works though (`application/msword; charset=binary`). – yugr Jul 23 '18 at 19:16
  • @yugr that's interesting - it almost looks like a bug in `file`, since a `.docx` file prints `binary` for `--mime-encoding`. – dimo414 Jul 23 '18 at 21:14
  • 1
    Forgot to report back here, but the [`.doc` bug was fixed](https://bugs.astron.com/view.php?id=18). – dimo414 Jul 23 '20 at 17:37
3

cat+grep

Assuming binary means the file containing NULL characters, this shell command can help:

(cat -v file.bin | grep -q "\^@") && echo Binary || echo Text

or:

grep -q "\^@" <(cat -v file.bin) && echo Binary

This is the workaround for grep -q "\x00", which works for BSD grep, but not for GNU version.

Basically -v for cat converts all non-printing characters so they are visible in form of control characters, for example:

$ printf "\x00\x00" | hexdump -C
00000000  00 00                                             |..|
$ printf "\x00\x00" | cat -v
^@^@
$ printf "\x00\x00" | cat -v | hexdump -C
00000000  5e 40 5e 40                                       |^@^@|

where ^@ characters represent NULL character. So once these control characters are found, we assume the file is binary.


The disadvantage of above method is that it could generate false positives when characters are not representing control characters. For example:

$ printf "\x00\x00^@^@" | cat -v | hexdump -C
00000000  5e 40 5e 40 5e 40 5e 40                           |^@^@^@^@|

See also: How do I grep for all non-ASCII characters.

kenorb
  • 155,785
  • 88
  • 678
  • 743
0

It's kind of brute force to exclude binary files with tr -d "[[:print:]\n\t]" < file | wc -c, but it is no heuristic guesswork either.

find . -type f -maxdepth 1 -exec /bin/sh -c '
   for file in "$@"; do
      if [ $(LC_ALL=C LANG=C tr -d "[[:print:]\n\t]" < "$file" | wc -c) -gt 0 ]; then
         echo "${file} is no ASCII text file (UNIX)"
      else
         echo "${file} is ASCII text file (UNIX)"
      fi
   done
' _ '{}' +

The following brute-force approach using grep -a -m 1 $'[^[:print:]\t]' file seems quite a bit faster, though.

find . -type f -maxdepth 1 -exec /bin/sh -c '
   tab="$(printf "\t")"
   for file in "$@"; do
      if LC_ALL=C LANG=C grep -a -m 1 "[^[:print:]${tab}]" "$file" 1>/dev/null 2>&1; then
         echo "${file} is no ASCII text file (UNIX)"
      else
         echo "${file} is ASCII text file (UNIX)"
      fi
   done
' _ '{}' + 
vron
  • 1
0

Try the following command-line:

file "$FILE" | grep -vq 'ASCII' && echo "$FILE is binary"
kenorb
  • 155,785
  • 88
  • 678
  • 743
user1985553
  • 1,032
  • 8
  • 7
0

You can do this also by leveraging the diff command. Check this answer:

https://unix.stackexchange.com/questions/275516/is-there-a-convenient-way-to-classify-files-as-binary-or-text#answer-402870

tonix
  • 6,671
  • 13
  • 75
  • 136
0

grep

Assuming binary means file containing non-printable characters (excluding blank characters such as spaces, tabs or new line characters), this may work (both BSD and GNU):

$ grep '[^[:print:][:blank:]]' file && echo Binary || echo Text

Note: GNU grep will report file containing only NULL characters as text, but it would work correctly on BSD version.

For more examples, see: How do I grep for all non-ASCII characters.

kenorb
  • 155,785
  • 88
  • 678
  • 743
0

Perhaps this would suffice ..

if ! file /path/to/file | grep -iq ASCII ; then
    echo "Binary"
fi

if file /path/to/file | grep -iq ASCII ; then
    echo "Text file"
fi
Mike Q
  • 6,716
  • 5
  • 55
  • 62