One approach is to use the "magic number" convention to read the first bits of a file.
http://www.astro.keele.ac.uk/oldusers/rno/Computing/File_magic.html
Examples:
"BM" is a Bitmap image
"GIF8" is a GIF image
"\xff\xd8\xff\xe0" is a JPEG image
Example in Ruby:
def bitmap?(data)
return data[0,2]=="MB"
end
def gif?(data)
return data[0,4]=="GIF8"
end
def jpeg?(data)
return data[0,4]=="\xff\xd8\xff\xe0"
end
def file_is_image?(filename)
f = File.open(filename,'rb') # rb means to read using binary
data = f.read(9) # magic numbers are up to 9 bytes
f.close
return bitmap?(data) or gif?(data) or jpeg?(data)
end
Why use this instead of the file name extension or the filemagic module?
To detect the data type before writing any data to disk. For example, we can read upload data stream before we write any data to disk. If the magic number doesn't match the web form content type, then we can immediately report an error.
We implement our real-world code slightly differently. We create a hash: each key is a magic number string, each value is a symbol like :bitmap, :gif, :jpeg, etc. If anyone would like to see our real-world code, feel free to contact me here.