35

I have a small byte array (under 25K) that I receive and decode as part of a larger message envelope. Sometimes this is an image, furthermore it is a JPG. I have no context information other than the byte array, and need to identify both if it IS an image, and if the image is of type JPG.

Is there some magic number, or magic bytes that exist at the beginning, end or at some offset that I can look at to identify it?

An example of my code looks like this (from memory, not c/p):

byte[] messageBytesAfterDecode = retrieveBytesFromEnvelope();
if(null != messageBytesAfterDecode && messageBytesAfterDecode > 0){
    if(areTheseBytesAJpeg(messageBytesAfterDecode)){
        doSomethingWithAJpeg(messageBytesAfterDecode)
    }else{
        flagEnvelopeAsHavingBadContentInTheField();
    }
}

I really need what would go into the

areTheseBytesAJpeg(byte[] mBytes){}

method, or even a pointer to a spec that details it. I'm hoping there is a very quick way to make this determination, since I don't really want to read them into an Image, etc.

DanielV
  • 2,076
  • 2
  • 40
  • 61
Kylar
  • 8,876
  • 8
  • 41
  • 75

6 Answers6

62

From wikipedia:

JPEG image files begin with FF D8 and end with FF D9.

http://en.wikipedia.org/wiki/Magic_number_(programming)

nhylated
  • 1,575
  • 13
  • 19
zsalzbank
  • 9,685
  • 1
  • 26
  • 39
22

Some Extra info about other file format with jpeg: initial of file contains these bytes

BMP : 42 4D
JPG : FF D8 FF EO ( Starting 2 Byte will always be same)
PNG : 89 50 4E 47
GIF : 47 49 46 38

When a JPG file uses JFIF or EXIF, The signature is different :

Raw  : FF D8 FF DB  
JFIF : FF D8 FF E0  
EXIF : FF D8 FF E1

some code:

private static Boolean isJPEG(File filename) throws Exception {
    DataInputStream ins = new DataInputStream(new BufferedInputStream(new FileInputStream(filename)));
    try {
        if (ins.readInt() == 0xffd8ffe0) {
            return true;
        } else {
            return false;

        }
    } finally {
        ins.close();
    }
}
RATHI
  • 5,129
  • 8
  • 39
  • 48
9

Another source of "knowledge" about magic numbers (including for JPEG files) is the magic file used by the GNU/Linux file command.

If you have the file command installed, then file --version will tell you where the magic file lives, and you can read it using a text editor ... and careful reading of man 5 magic.

(And the magic file contents confirm the details of other answers.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
6

Quoting this wikipedia article:

JPEG image files begin with FF D8 and end with FF D9. JPEG/JFIF files contain the ASCII code for "JFIF" (4A 46 49 46) as a null terminated string. JPEG/Exif files contain the ASCII code for "Exif" (45 78 69 66) also as a null terminated string, followed by more metadata about the file.

  • 2
    Note, though, that some JPEGs have neither 4A 46 49 46 nor 45 78 69 66 at that position (although most I've seen do). Not an expert on this stuff, but I'm looking at a JPEG that has 50 68 6F 74 at that position; that corresponds to the ASCII "Phot" in "Photoshop," although I've saved JPEGs in several ways from Photoshop and am not able to replicate this. (Photoshop, however, recognizes this file as a JPEG, as does Windows and OS X.) This file contains neither the JFIF nor the Exif markers ANYWHERE. Finally, the file DOES start with FF D8 and end with FF D9 (as it should since it's a JPEG). – James Corcoran Nov 16 '13 at 06:48
4

A lot of formats are identified by so-called magic numbers. These are byte sequences usually in the front of the file to identify whether the following binary data is really what you think it is. A quick google search returned: http://www.linfo.org/magic_number.html and specifically the citation:

"Similarly, a commonly used magic number for JPEG (Joint Photographic Experts Group) image files is 0x4A464946, which is the ASCII equivalent of JFIF (JPEG File Interchange Format). However, JPEG magic numbers are not the first bytes in the file; rather, they begin with the seventh byte. Additional examples include 0x4D546864 for MIDI (Musical Instrument Digital Interface) files and 0x425a6831415925 for bzip2 compressed files."

damg
  • 629
  • 4
  • 7
  • Jfif is not necessarily the same as jpeg. Although, what most people mean when they say jpeg, is actually jfif, as they assume it uses YUV as a color format. – onemasse Jan 06 '11 at 12:05
  • 1
    Please note re @onemasse comment, though, that a lot of JPEGs are Exif, not JFIF, for example many JPEGs taken with digital cameras, many JPEGs saved from Photoshop (which means lots of JPEGs to be found on the web), etc. This is based on my personal experience, but there's more here: http://en.wikipedia.org/wiki/JPEG_File_Interchange_Format. – James Corcoran Nov 16 '13 at 06:31
0

A JPG file does have a specific header that you could use to determine a very good likelihood that it is a JPG file. However, it's not clear if you will have the entire file in the byte array.

Anyway, here's specifics on the header: http://www.fastgraph.com/help/jpeg_header_format.html

Jonathan Wood
  • 65,341
  • 71
  • 269
  • 466