2

If I have a pointer to TIFF data, but no indication of the size, is there any way to accurately calculate it?

I've gone through several different ideas, all of which work most of the time, but not always, since there's just so many different ways to format a TIFF, and I figured there has to be an easier way to do this. Right now, the closest I've gotten is:

ULONG readImageHeader(char* image)
{
TIF_HDR       *xTIFHdr;
TIF_IFD       *xTIFIFD;
TIF_IFD_ENTRY *pxTIFIFDEntry;
UCHAR         *pHdrPtr;
USHORT         i;
ULONG length  = 0;
ULONG imgLength = 0;
ULONG count = 0;

// check to see if it is a TIFF header
xTIFHdr = (TIF_HDR *)image;

// Little Endian
if (xTIFHdr->usTIFID == TIF_HEAD_LITTLE)
{
    pHdrPtr = (UCHAR*)image;
    pHdrPtr += xTIFHdr->ulFirstIFDOffset;

    // read TIF IFD
    xTIFIFD = (TIF_IFD *)pHdrPtr;

    // Look at all the IFD entries and set internal image hdr
    pHdrPtr += TIF_IFD_LEN;
    pxTIFIFDEntry = (TIF_IFD_ENTRY *)pHdrPtr;

    // iterate through each IFD entry
    for (i=0; i<xTIFIFD->usNumIFDEntries; i++)
    {
        if(length <= (ULONG)pxTIFIFDEntry->ulTIFValueOffset)
        {
            length = (ULONG)pxTIFIFDEntry->ulTIFValueOffset;

            // the TIF length is in units of the TIF type
            switch(pxTIFIFDEntry->usTIFType)
            {
            case TIF_BYTE:
                length += (ULONG)pxTIFIFDEntry->ulTIFLength * TIF_BYTE_SIZE;
                break;
            case TIF_ASCII:
                length += (ULONG)pxTIFIFDEntry->ulTIFLength * TIF_ASCII_SIZE;
                break;
            case TIF_SHORT:
                length += (ULONG)pxTIFIFDEntry->ulTIFLength * TIF_SHORT_SIZE;
                break;
            case TIF_LONG:
                length += (ULONG)pxTIFIFDEntry->ulTIFLength * TIF_LONG_SIZE;
                break;
            case TIF_RATIONAL:
                length += (ULONG)pxTIFIFDEntry->ulTIFLength * TIF_RATIONAL_SIZE;
                break;
            default:
                length += (ULONG)pxTIFIFDEntry->ulTIFLength;
                break;
            }
        }
        switch (pxTIFIFDEntry->usTIFTag)
        {
        case TIF_STRIP_BYTE_COUNTS:
        case TIF_STRIP_OFFSETS:
        {
            ULONG valueOffset = (ULONG)pxTIFIFDEntry->ulTIFValueOffset;
            count = (ULONG)pxTIFIFDEntry->ulTIFLength;

            // if the count > 1, then the valueOffset actually represents an offset
            if(count > 1)
            {
                ULONG countsize = (count - 1) * sizeof(ULONG);
                imgLength += *(ULONG*) ((UCHAR*)image + valueOffset + countsize);
            }
            else
            {
                // if count is 1, then the valueOffset is really just the value of that item
                imgLength += valueOffset;
            }
            break;
        }
        default:
            break;
        }
    pxTIFIFDEntry++;
    }

    // the length is the largest offset, plus the length of that item
    // the imgLength is the offset of the image, plus the size of the image, which is stored as two separate tags
    // return the largest of them
    return(length > imgLength ? length : imgLength);
}
// Big Endian
else if(xTIFHdr->usTIFID == TIF_HEAD_BIG)
{
    // I don't care about this
    printf("Big Endian TIFF image\n");
}

printf("Invalid TIFF image\n");
return(0);
}

Essentially what I'm doing here is I'm iterating through the TIFF header, and calculating two running sums: (largest offset + data length) and (strip offset + strip byte count). Then I just use the larger of the two values.

This mostly works, except that sometimes the ulTIFValueOffset is not an offset at all, but the actual value. In (some of) those cases, I'm getting a file size that is too big. So far, all my failed examples have been when it's grabbing the Width or Length tag, although I can't rule out the possibility that other tags could have the same problem.

Is there either

  1. A way to calculate the file size given the headers? or
  2. A way to know if the headers are a value or an offset?

Thanks!

  • 3
    [Here](http://www.fileformat.info/format/tiff/corion.htm) is an extensive format description. Or just find a ready-made library.. – Eugene Sh. May 26 '15 at 15:03
  • As the comment suggested, I highly recommend you get a library for TIFF image handling (usually libtiff). You may go crazy trying to manipulate all the various types of TIFF files out there. Your code may not even work for a totally different file that is deemed as TIFF. – PaulMcKenzie May 26 '15 at 15:18
  • I've looked at various TIFF libraries, but none of them seem to calculate size. I only need the size so that I can copy the image in memory; I don't need to do anything with the image after that. I'm currently investigating how to open an image from memory using libtiff, and hopefully from there I can get the size. – Robert Dole May 26 '15 at 15:27
  • @EugeneSh. Referencing an out of date version of the TIFF specification (5.0 versus 6.0 circa 1992) isn't that useful. – mctylr May 26 '15 at 16:14
  • @RobertDole is this a generated image such that there is no possibility of finding any filename the data originated from? (I'm sure you thought about it, but if you can get the filename, you have the size) – David C. Rankin May 26 '15 at 16:28
  • @DavidC.Rankin: Yes. I'm replacing an existing library, so unfortunately, I have to maintain the same function signatures and won't have access to the actual file. – Robert Dole May 26 '15 at 16:37

2 Answers2

3

The pragmatic oriented answer, is that unless you absolutely must, don't handle image formats directly yourself. Use an image library. For TIFF, there are a variety of free (libre and/or gratis) graphic file libraries including libTIFF, ImageMagick / GraphicMagick, DevIL, FreeImage and others.

The TIFF image format is very powerful and flexible, but at the expense of being arguably the most complex image format, as outlined in TIFF 6.0 specification. In addition, current implementations also incorporate TIFF Technical Note #2 for JPEG support, plus the BigTIFF draft.

I've gone through several different ideas, all of which work most of the time, but not always, since there's just so many different ways to format a TIFF

This is why I recommend using an image library.


If I have a pointer to TIFF data, but no indication of the size, is there any way to accurately calculate it?

If you by "the TIFF data" you mean to the TIFF image itself, no, not that I know of. You cannot determine the file size (on disk or in-memory) of a TIFF image without parsing it.

A way to calculate the file size given the headers?

Just using the 8-byte image file header, then no.

By parsing the Image File Directory (IFD) you may be able to calculate the value.

A way to know if the headers are a value or an offset?

You should be able to determine when the IFD (image file directory, terminology from the TIFF specification) entry's ValueOffset is a value or an offset. It is the value if and only if it fits within the 4-bytes (the size of the ValueOffset field). (Ref: TIFF 6.0 specification: TIFF Structure - Value/Offset)

mctylr
  • 5,159
  • 20
  • 32
2

I'm interpreting your question to be "all I have is a blind pointer to data which is allegedly a TIFF. Can I determine the size of the block of memory allocated to that pointer?"

As for determining block size just from TIFF data alone, the answer to that is sometimes, but in the general case no and certainly not safely.

TIFF IFD structures are built as a conceptual linked list with the last 4 bytes in any IFD pointing to the offset of the next IFD or being 0. I have a collection of broken TIFF's for testing my TIFF library which demonstrate that some people who write code to write TIFFs can't even get this simple task right. I frequently see IFD offsets or data offsetz that point off into space somewhere. If you write in-memory IFD traversal code without knowing the limits of your block of memory, you'll be lucky if you get a segmentation fault when you traipse through your heap.

TIFF is a deceptive file format. A cursory look indicates that it's straightforward, but there are so many screwy special cases that code that consumes TIFF needs to handle those cases and the cases where producers botched the special cases.

Even if you write a full consumer that skims all the IFDs and all the offset tags and tries to figure out which is furthest in the data, there is still no guarantee that the data isn't truncated (I have several files of this stripe) nor that there isn't more junk data after the last IFD (I have several files of that kind).

If you decide to write code to traverse the file (and I don't recommend that you do), you should consider an abstraction layer for reading data into structs rather than blind casting as TIFF data offsets do not have to obey any particular word/long word alignment and that may cause you grief.

Community
  • 1
  • 1
plinth
  • 48,267
  • 11
  • 78
  • 120