SVN:
When you first add or import a file into Subversion, the file is examined to determine if it is a binary file. Currently, Subversion just looks at the first 1024 bytes of the file; if any of the bytes are zero, or if more than 15% are not ASCII printing characters, then Subversion calls the file binary. This heuristic might be improved in the future, however.
http://subversion.apache.org/faq.html#binary-files
Git works in a similar way. Git usually guesses correctly whether a blob contains text or binary data by examining the beginning of the contents - It checks for any occurrence of a zero byte (NUL “character”) in the first 8000 bytes.
http://git-scm.com/docs/gitattributes
And from Git source:
#define FIRST_FEW_BYTES 8000
int buffer_is_binary(const char *ptr, unsigned long size)
{
if (FIRST_FEW_BYTES < size)
size = FIRST_FEW_BYTES;
return !!memchr(ptr, 0, size);
}
http://git.kernel.org/?p=git/git.git;a=blob;f=xdiff-interface.c;h=0e2c169227ad29b5bf546c6c1b97e1a1d8ed7409;hb=HEAD
And @tonfa makes a good point that "Also note that the only place where it cares about a file being text vs. binary is for diplaying diff, and for doing merges. The storage format does not care about it."