Possible Duplicate:
How can I determine if a file is binary or text in c#?
Without consider the filename (the extension), using only the content, we need to know if a file is text or binary. I can’t use the extension because I don’t know all the text file extensions, and because a text file can be without extension.
I was doing it looking for the percentage of the non -ASCII bytes in the first part of the file. I cannot read the full file each time for performance reasons. I was using the following code:
private static bool IsBinary(byte[] bytes, int maxLength)
{
int len = maxLength > 1024 ? 1024 : maxLength;
int nonASCIIcount = 0;
for( int i = 0; i < len; ++i )
if( bytes[i] > 127 )
++nonASCIIcount;
// if the number of non ASCII is more than a 30%
// then is a binary file.
return (nonASCIIcount / len) > 0.3;
}
The problem is that some kinds of files are wrongly detected as text because the first part of the file is text like photoshop files.
Any suggestion?