0

I am trying read the Netpbm image format, following the specification explained here. The ascii types for the format (which have P1, P2 and P3 as magic number), I can read without problems. But I have issues reading the binary data in these files (whose with P4, P5 and P6 as magic number) - the header for the file (which is ascii) I am able to read without problem.

In the link, it is stated that:

In the binary formats, PBM uses 1 bit per pixel, PGM uses 8 or 16 bits per pixel, and PPM uses 24 bits per pixel: 8 for red, 8 for green, 8 for blue. Some readers and writers may support 48 bits per pixel (16 each for R,G,B), but this is still rare.

with this, I try use this answer to read the data, bit by bit, and got this code:

if(*this->magicNumber == "P4") {
  this->pixels = new Matrix<int>(this->width, this->height);

  vector<int> p;
  while(getline(file, line_pixels)) {
    if(line_pixels.size() > 0 && line_pixels.at(0) != '#') {
      string byte;
      stringstream ss(line_pixels);
      while(getline(ss, byte)) {
        unsigned char c = (unsigned char)byte.at(0);
        for(int x=0; x != 8; x++) p.push_back( (c & (1 << x)) != 0 );
      }
    }
  }

  int count = 0;
  for(int i=0; i<height; i++) {
    for(int j=0; j<width; j++) {
      this->pixels->set(i, j, p[count++]);
    }
  }
}

but when I try read the image named as sample_640×426.pbm in this link, I should get this result:

expected result

but I am getting this result instead:

enter image description here

For the binary format for PGM and PPM images, when I try open the image, I got a segmentation fault error when I try increment count at some point in the execution of the loop. I think somehow the size of vector<int> p is ending bigger than the expected product width x height.

the code for the PGM format:

if(*this->magicNumber == "P5") {
  this->pixels = new Matrix<int>(this->width, this->height);

  vector<int> p;
  while(getline(file, line_pixels)) {
    if(line_pixels.size() > 0 && line_pixels.at(0) != '#') {
      string number;
      stringstream ss(line_pixels);
      while(getline(ss, number)) {
        unsigned char data = (unsigned char)number.at(0);
        p.push_back((int)data);
      }
    }
  }

  int count = 0;
  for(int i=0; i<height; i++) {
    for(int j=0; j<width; j++) {
      this->pixels->set(i, j, p[count++]);
    }
  }
}

the code for the PPM format:

if(*this->magicNumber == "P6") {
  this->pixels = new Matrix<struct Pixel>(this->width, this->height);

  vector<int> p;
  while(getline(file, line_pixels)) {
    if(line_pixels.size() > 0 && line_pixels.at(0) != '#') {
      string byte;
      stringstream ss(line_pixels);
      while(getline(ss, byte)) {
        unsigned char data = (unsigned char)byte.at(0);
        p.push_back((int)data);
      }
    }
  }

  int count = 0;
  for(int i=0; i<height; i++) {
    for(int j=0; j<width; j++) {
      struct Pixel pixel;
      pixel.r = p[count++];
      pixel.g = p[count++];
      pixel.b = p[count++];
      this->pixels->set(i, j, pixel);
    }
  }
}

Anyone can give a hint of what I am doing wrong here?

Kleber Mota
  • 8,521
  • 31
  • 94
  • 188
  • You can read an ASCII (or partially ASCII) encoded file as binary and properly interpret the bytes as ASCII in your program. But it's generally not a good idea to read a binary file that is not ASCII encoding as if it were ASCII (e.g., as "strings"). – lurker Apr 17 '22 at 13:13

1 Answers1

4
while(getline(file, line_pixels)) {

std::getline reads from the input stream until a newline character is read.

A file is a file. It contains bytes. Whether you believe the file contains text, or binary, is purely a matter of interpretation.

Text lines are terminated by a newline character. That's what std::getline does: it reads bytes from a file until it reads a newline character. Whatever gets read, goes into the std::string parameter.

This would be very confusing if your intent is to read some binary data, like an image. A byte containing the same value as a newline character can occur naturally in a binary file like an image file, representing the appropriate pixel values. Using std::getline to read non-textual data always ends in tears.

This would only make sense in one situation: if you already know, in advance, the the binary data you intend to read here ends with a byte that just happens to be the newline character, and that newline character appears nowhere else.

But, of course, in an image file, you have no such guarantees, whatsoever.

When reading image data you are typically expected to read a specific amount of bytes from the file.

Here, you know, in advance, the size of your image and its format. Based on that you can calculate using a simple mathematical formula how many bytes you expect to read.

And that happens to be what std::istream's read() method does: read the specific number of bytes from a file. This link provides more information.

You need to replace all shown code that improperly uses getline with one that uses read().

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • my issue is that I do not know the size of the binary data, and I alread read some ascii data from the header. And yes, if I understood correctly the specification the format, the new line character is present in the end of each line in the binary part of the file. – Kleber Mota Apr 17 '22 at 12:51
  • 2
    You'll be surprised to learn that you already know the size of the binary data: `this->pixels = new Matrix(this->width, this->height);`. You calculated the size of the binary data in pixels (and just one more step will calculate the size of the binary data in bytes). And, based on the reference information, I conclude that the `P4`, `P5`, and `P6` formats must be binary formats, not text formats. No newlines, no getlines, just reads. – Sam Varshavchik Apr 17 '22 at 12:54