1

I'm trying to check the given file is binary or not. I refer the link given below to find the solution, How can I check if file is text (ASCII) or binary in C

But the given solutions is not working properly, If I pass the .c file as argument, Its not working, It gives wrong output.

The possible files I may pass as argument:

a.out

filename.c

filename.txt

filename.pl

filename.php

So I need to know whether there is any function or way to solve the problem?

Thanks...

Note : [ Incase of any query, Please ask me before down vote ]

Community
  • 1
  • 1
Ganapathy
  • 545
  • 1
  • 6
  • 23
  • What is binary for you/your task? Is it sufficient to check file extension or does it require parsing the file. You might also consider providing your not working code for the community to check. – maxik Oct 24 '16 at 06:04
  • No, I won't check the extension, I need to parse the file and find the file is binary or not. – Ganapathy Oct 24 '16 at 06:08
  • I'm trying to implement the 'grep' command in c – Ganapathy Oct 24 '16 at 06:08
  • 1
    Do you really want to check every byte in the file to make sure it is within ASCII range or not? If the file is huge it would cost you. What about Unicode and various UTF encodings? They are also text files. Or a file that is mostly ASCII but contains a few bytes of non-ASCII? You can also consider how grep handle it: http://unix.stackexchange.com/questions/19907/what-makes-grep-consider-a-file-to-be-binary – KC Wong Oct 24 '16 at 06:17
  • I'm trying to implement the grep command in c, So I need to check it properly. If the given pattern is matched in the binary file, Then It has to print the given string 'pattern matches in the binary file' instead of printing the matched line. – Ganapathy Oct 24 '16 at 06:30
  • If the pattern match in normal file means, It has to print the matched line from file – Ganapathy Oct 24 '16 at 06:31
  • you can use `isprint`, `isspace`, `iscntrl` functions in `` to check every bytes – Ozan Oct 24 '16 at 06:37
  • What did the debugger tell you? – alk Oct 24 '16 at 07:47
  • And to speed things up on your test of the file contents, take a look at the `strlen` source code and you can model your test on the unrolled loop that checks 4-bytes per-iteration rather than one. From an approach standpoint, you are simply opening the file and then looking for the first character that you define as *non-ASCII* (or EOF) whichever occurs first. If you hit EOF before bailing due to a binary character -- its ASCII by your definition. – David C. Rankin Oct 24 '16 at 07:51
  • 1
    Are you on Linux? If yes, just use the command "file " and see the output. You can then call this command from your C program using system(). Refer man or check here for more details on the output of "file" command - https://linux.die.net/man/1/file – Runcy Oommen Oct 24 '16 at 07:52
  • the first two bytes of most files is a 'magic' number that indicates what kind of file it is. – user3629249 Oct 25 '16 at 07:13

2 Answers2

2

You need to clearly define what a binary file is for you, and then base your checking on that.

If you just want to filter based on file extensions, then create a list of the ones you consider binary files, and a list of the ones you don't consider binary files and then check based on that.

If you have a list of known formats rather then file extensions, attempt to parse the file in the formats, and if it doesn't parse / the parse result makes no sense, it's a binary file (for your purposes).

Magisch
  • 7,312
  • 9
  • 36
  • 52
  • 1
    On UNIX-like platforms, there is also `libmagic`, a C interface to [file](https://en.wikipedia.org/wiki/File_%28command%29), which could be used to detect file types – kdmurrray91 Oct 24 '16 at 07:55
1

Depending on your OS, binary files begin with a header specifying that they are an executable and contain several informations about them (architecture, size, byte order etc.). So by trying to parse this header, you should know if a file is binary or not. If you are on mac, check the Mach-O file format, if you are on Linux, it should be ELF format. But beware, it's a lot of documentation.

KeylorSanchez
  • 1,320
  • 9
  • 15