0

I am working on a embedded deep learning inference C++ project using tensorRT. For my model it is necessary to subtract the mean image.

The api that I'm using allows me to define a mean image with the following data structure for rgb images:

uint8_t *data[DW_MAX_IMAGE_PLANES];       // raw image data 
size_t pitch;                             // pitch of the image in bytes
uint32_t height;                          // height of the image in px
uint32_t width;                           // image width in px
uint32_t planeCount;                      // plane count of the image

So far I found the lib LodePNG, which is quite usefull for this task I think. It can load pngs with just a few lines:

// Load file and decode image.
std::vector<unsigned char> image;
unsigned width, height;
unsigned error = lodepng::decode(image, width, height, filename);

The question now is how to convert std::vector<unsigned char> to uint8_t *[DW_MAX_IMAGE_PLANES] and calculate the pitch and planeCount values?

As I'm using rgb images DW_MAX_IMAGE_PLANES equals 3.

Ashwin Nanjappa
  • 76,204
  • 83
  • 211
  • 292
johni07
  • 761
  • 1
  • 11
  • 31
  • The title of your question seems kinda wrong, since you loaded the png already successfully. – Hannes Hauptmann Jun 19 '17 at 09:36
  • Perhaps a [`std::vector`](http://en.cppreference.com/w/cpp/container/vector) reference might be useful? There are a few ways to get a pointer to the data managed by the vector, including (but not limited to) getting a pointer to its first element. – Some programmer dude Jun 19 '17 at 09:37
  • As for the rest, doesn't the library you have supply you with that meta-data? – Some programmer dude Jun 19 '17 at 09:38
  • Using reinterpret_cast as described here https://stackoverflow.com/questions/4254615/how-to-cast-vectorunsigned-char-to-char Just make sure, that unsigned char and uint8_t are of the same size. – Marek Vitek Jun 19 '17 at 09:46
  • @MarekVitek On a platform where `uint8_t` exists, it's extremely unlikely that it's not the same as `unsigned char`. If `char` is not 8 bits then there can't really be an `int8_t` type. – Some programmer dude Jun 19 '17 at 09:49
  • You mean like on the platforms, where char is 7bit? :-) Never say never. But you are right, this is highly unlikely and on modern platforms size is same. – Marek Vitek Jun 19 '17 at 09:54
  • @Someprogrammerdude no LodePNG does not supplay any planeCount or pitch values. – johni07 Jun 19 '17 at 10:27
  • @Someprogrammerdude: I agree about `uint8_t`, but `int8_t` could be a different type than `signed char`. **In theory**. Practically, I have yet to find such a system and I don't think it makes much sense. – too honest for this site Jun 19 '17 at 11:13
  • Please notice: I have updated the expected format for the raw data: it is now uint8_t *data[DW_MAX_IMAGE_PLANES] – johni07 Jun 19 '17 at 12:12
  • Question how to calculate pitch has already [been answered](https://stackoverflow.com/questions/20041509/how-to-calculate-pitch-of-an-image-visual-studio). – Marek Vitek Jun 19 '17 at 21:59

1 Answers1

1

The values for pitch and planeCount are simple. Since LodePNG's decode defaults to bitdepth = 8, the value of pitch, in bytes, is 1. And because the image is RGB, the value of planeCount is 3--one plane for each color.

Since you are not using the alpha channel, you should probably have LodePNG simply decode into RGB format directly:

unsigned error = lodepng::decode(image, width, height, filename, LCT_RGB);

But once the image is decoded into the std::vector<unsigned char>, you will not be able to use it directly. The decoded data from LodePNG is in the following format:

image -> R0, G0, B0, R1, G1, B1, R2, G2, B2, ...

But you need it in the following format:

data[0] -> R0, R1, R2, ...
data[1] -> G0, G1, G2, ...
data[2] -> B0, B1, B2, ...

If you are memory constrained, you'll have to rearrange the values in the image vector (R0, R1, ... Rn, G0, G1, ... Gn, B0, B1, ... Bn) and calculate the appropriate pointers to initialize the data array.

If you have available memory, you can create separate vectors for each of the three color channels. Then copy the data from the decoded image and initialize the data array with pointers to the first element of the vectors.

D Krueger
  • 2,446
  • 15
  • 12