2

I have a big binary file with lots of files stored inside it. I'm trying to copy the data of a PCX image from the file and write it to a new file which I can then open in an image editor.

After obtaining the specs for the header of a PCX file I think that I've located the image in the big binary file. My problem is that I cannot figure out how many bytes I'm supposed to read after the header. I read about decoding PCX files, but I don't want to decode anything. I want to read the encoded image data and write that to a seperate file so the image editor can open in.

Here is the header. I've included the values of the image as I guess they can be used to determine the "end-of-file" for the image data.

struct PcxHeader
{
BYTE Identifier;        // PCX Id Number (Always 0x0A) // 10  
BYTE Version;           // Version Number    // 5  
BYTE Encoding;          // Encoding Format    // 1  
BYTE BitsPerPixel;      // Bits per Pixel    // 8  
WORD XStart;            // Left of image     // 0  
WORD YStart;            // Top of Image     // 0  
WORD XEnd;              // Right of Image    // 319  
WORD YEnd;              // Bottom of image    // 199  
WORD HorzRes;           // Horizontal Resolution   // 320  
WORD VertRes;           // Vertical Resolution   // 200  
BYTE Palette[48];       // 16-Color EGA Palette    
BYTE Reserved1;         // Reserved (Always 0)  
BYTE NumBitPlanes;      // Number of Bit Planes   // 1  
WORD BytesPerLine;      // Bytes per Scan-line   // 320  
WORD PaletteType;       // Palette Type     // 0  
WORD HorzScreenSize;    // Horizontal Screen Size   // 0  
WORD VertScreenSize;    // Vertical Screen Size   // 0  
BYTE Reserved2[54];     // Reserved (Always 0)
};
fearofawhackplanet
  • 52,166
  • 53
  • 160
  • 253
links77
  • 211
  • 4
  • 7
  • 1
    You need to decode the binary file format not the PCX.... it most likely has the individual file sizes in some directory table or has a chunk format like RIFF which contains some ID and size for each chunk (file) stored sequentially. In kase of many PCX you can find individual headers and use their distance as file size ... hoping PCX files are consequent and not have any gaps in between them in the bin file – Spektre Apr 25 '17 at 17:17

2 Answers2

4

There are three components to the PCX file format:

  • 128-byte header (though less are actually used, it is 128 bytes long)
  • variable-length image data
  • optional 256 color palette (though improper PCX files exist with palette sizes other than 256 colors).

From the Wikipedia artice:

Due to the PCX compression scheme the only way to find the actual length of the image data is to read and process it. This effort is made difficult because the format allows for the compressed data to run beyond the image dimensions, often padding it to the next 8 or 16 line boundary.

In general, then, it sound like you'll have to do a "deep process" of the image data to find the complete PCX file embedded within your larger binary file.

fbrereto
  • 35,429
  • 19
  • 126
  • 178
  • It doesn't have to be fully decoded. For instance if you're reading the image data and you know the next 42 bytes are pixels you can skip over them without decoding them, knowing that your overall goal is to simply get to the end of the image data. – fbrereto Nov 06 '09 at 20:58
  • PCX files with 16-colour, 8-colour or 2-colour palettes are not "improper" at all. They're simply used for 4-bit, 3-bit and 1-bit images, all of which are perfectly valid in PCX. But those should be stored in the header, not at the end. – Nyerguds Aug 05 '19 at 11:19
0

Without knowing much about the PCX file format, I can take a best guess at this:

 bytesAfterHeader = header.BytesPerLine * header.VertRes;
Ron Warholic
  • 9,994
  • 31
  • 47
  • This is my reference for the PCX file format. http://www.fileformat.info/format/pcx/egff.htm – links77 Nov 06 '09 at 19:27
  • that is not correct as RLE has variable length and the header number is the max value you need for decode buffer. – Spektre Apr 25 '17 at 17:14
  • @Spektre Not even; worst-case for compression output _can_ be larger than the final image, and PCX has no official support for saving images as uncompressed. – Nyerguds Aug 05 '19 at 11:20
  • @Nyerguds Your comment does no sense to me. You probably missed the point that PCX has no file size description in it just the size of line buffer needed which is not enough. So only way to detect the real file size is uncompress it !!! using some "safe" size will work only for the first file in the package and corrupting all the consequent ones ... – Spektre Aug 05 '19 at 11:52
  • Ah, sorry, I misinterpreted that. The calculation above in fact should give the _exact_ size of the full decompression buffer, not the "max value". I was trying to say that the _compressed content_ may, in some cases, be even larger than that, if the data compressed poorly. You are absolutely right in saying that that has no relation to finding the end of the compressed data, though. Only way to get that is to decompress it, and then to see if there's still a palette behind that compressed data. – Nyerguds Aug 05 '19 at 11:58