0

In my git pre-recieve hook on server side I get a list of files in commit by using this command:

my @new_file_list =  `git diff --name-only $old..$new`;;

I get a size of every file, by looping:

foreach $file (@new_file_list)
{
  $size = `git cat-file -s $new:$file`;

By using what command can I get a boolean variable, that will define, if the file in the commit is binary? So I have a filename in the input, and $old $new revisions, what command can define, if the file is binary?

Thanks in advance.

dice2011
  • 937
  • 2
  • 15
  • 30
  • 1
    You can only guess. You can try `file`, or [MIME::Detect](http://p3rl.org/MIME::Detect), [File::LibMagic](http://p3rl.org/File::LibMagic), [File::MMagic](http://p3rl.org/File::MMagic), ... – choroba Jun 07 '17 at 10:50
  • How do you define "binary file"? – ikegami Jun 07 '17 at 14:40

3 Answers3

4

Perl has a number of file test operators which will tell you various things about a file (you're already using -s). These include the following (taken from the documentation):

-T File is an ASCII or UTF-8 text file (heuristic guess).

-B File is a "binary" file (opposite of -T).

It's worth emphasising that this is just a heuristic. Perl examines the start of the file and checks what proportion of the characters appear to be printable.

So you can use these in your code like this:

if (-B $filename) {
  # file is (probably) binary
}
Dave Cross
  • 68,119
  • 3
  • 51
  • 97
  • 2
    The `-s` switch you see in the OP is a switch to `git cat-file`, not to Perl. But apparently it has the same meaning (fetch the size). – PerlDuck Jun 07 '17 at 11:29
4

It depends what you exactly want to achieve and by what cost. If you want to prevent accidentally commits of files being a result of compilation, just add a .gitignore file excluding them from being committed (btw, this is always a good idea, exclude also backup copies and editor temporary files) and in a hook check if extension of committed file is in allowed list.

The aforementioned -T/-B check in Perl is nice, however, it's good to read proposed documentation. It's less efficient than checking the extension, but it gives the answer using the real content of the file.

If the -B/-T heuristic, as described in documentation, does not suit your needs, use file. In Perl you have few packages already available:

File::Type
File::LibMagic
File::MMagic

You will receive the MIME type of the file and you need to write some logic to interpret the result.

Unless there are no specific requirements which we are not aware of, I personally would stuck to .gitignore and checking extensions. Optionally, you can consider some kind of stick (preferably virtual, as corporal punishment is generally frowned upon) for developers committing forbidden files into repository.

Harini
  • 551
  • 5
  • 18
ArturFH
  • 1,697
  • 15
  • 28
1

You may have noticed that Git sometimes tells you “binary files ... and ... differ”.

According to this answer to a similar question Git checks whether a file is binary or not by looking at the first 8,000 bytes. If they contain a NUL byte, then Git considers the file to be binary.

You can use git diff in your hook and let Git decide:

if git diff --numstat $old $new -- $file | grep -q -P -e '-\t-\t'; then
    # binary
else
    # text
fi

This is even possible for files that have nothing to do with Git and are not in any repository. If

git diff --no-index --numstat /dev/null $some_file

prints dash-TAB-dash-TAB then the file is binary (from Git's point of view). From the docs:

git diff --no-index [--options] [--] [<path>…​]

This form is to compare the given two paths on the filesystem.

...

--numstat

Similar to --stat, but shows number of added and deleted lines in decimal notation and pathname without abbreviation, to make it more machine friendly. For binary files, outputs two - instead of saying 0 0.

PerlDuck
  • 5,610
  • 3
  • 20
  • 39