3

Background information: I need to load some 16 bit grayscale PNGs.

Does Caffe support loading 16 bit images through the ImageDataLayer?

After some googling, the answer seems it doesn't. The ImageDataLayer relies on this io routine

cv::Mat ReadImageToCVMat(const string& filename,
    const int height, const int width, const bool is_color) {
  cv::Mat cv_img;
  int cv_read_flag = (is_color ? CV_LOAD_IMAGE_COLOR :
    CV_LOAD_IMAGE_GRAYSCALE);
  cv::Mat cv_img_origin = cv::imread(filename, cv_read_flag);
  if (!cv_img_origin.data) {
    LOG(ERROR) << "Could not open or find file " << filename;
    return cv_img_origin;
  }
  if (height > 0 && width > 0) {
    cv::resize(cv_img_origin, cv_img, cv::Size(width, height));
  } else {
    cv_img = cv_img_origin;
  }
  return cv_img;
}

Which uses opencv's cv::imread function. This function will read the input as 8bits unless the appropiate flag is set

CV_LOAD_IMAGE_ANYDEPTH - If set, return 16-bit/32-bit image when the input has the corresponding depth, otherwise convert it to 8-bit.

Simply adding the appropriate flag will not work because later in the code [io.cpp] they check for 8bit depth:

void CVMatToDatum(const cv::Mat& cv_img, Datum* datum) {
  CHECK(cv_img.depth() == CV_8U) << "Image data type must be unsigned byte";
... }

I could just remove the check but I'm afraid it's there for a reason and unpredictable results might happen. Can anybody shine light on this issue?

Shai
  • 111,146
  • 38
  • 238
  • 371
Enoon
  • 421
  • 4
  • 17

2 Answers2

2

You can patch ImageDataLayer to read 16bit images like this:

  1. Add appropriate flag as you mentioned (io.cpp):

after

int cv_read_flag = (is_color ? CV_LOAD_IMAGE_COLOR :
    CV_LOAD_IMAGE_GRAYSCALE);

add

cv_read_flag |= CV_LOAD_IMAGE_ANYDEPTH;
  1. Modify the check you mentioned (data_transformer.cpp):

this

CHECK(cv_img.depth() == CV_8U) << "Image data type must be unsigned byte";

becomes

CHECK(cv_img.depth() == CV_8U || cv_img.depth() == CV_16U) << "Image data type must be uint8 or uint16";
bool is16bit = cv_img.depth() == CV_16U;
  1. Modify the way DataTransformer reads cv::Mat like this (same function below):

add pointer of uint16_t type to:

const uchar* ptr = cv_cropped_img.ptr<uchar>(h);

like this

const uint16_t* ptr_16 = cv_cropped_img.ptr<uint16_t>(h);

Then read using appropriate pointer:

Dtype pixel = static_cast<Dtype>(ptr[img_index++]);

becomes

Dtype pixel;
if(is16bit)
    pixel = static_cast<Dtype>(ptr_16[img_index++]);
else
    pixel = static_cast<Dtype>(ptr[img_index++]);
Dzugaru
  • 322
  • 3
  • 8
1

Caffe works with float32 variables, by default. An image is usually represented as a C-by-H-by-W blob, where C=3 for three color channels. So, working with three channels of type float32 allows you to deal with images in uint16 provided you convert properly to float32.

I do not have personal experience with "ImageData" layer, so I cannot comment on how you can or cannot load uint16 image data using this layer.

However, you might find "HDF5Data" layer useful: you can externally read and convert your images to hdf5 data format (that supports float32) and then feed the converted data to caffe via "HDF5Data" layer.

You can find more information on "HDF5Data" layer here and here.

Community
  • 1
  • 1
Shai
  • 111,146
  • 38
  • 238
  • 371
  • HDF5Data may clearly be a possibility, but images have an edge, imo, in that you can actually better visualize what is going on. I'm also using Nvidia digits, which has some nice extra visualization features. What I could do, is to create a new layer type which returns float32 data... any hints one where to start doing this? – Enoon Mar 20 '16 at 11:49
  • IMHO, if you implement a new layer, you are back to square one with respect to visualization and the extra features DIGITS offers, so why not use HDF5Data layer that you don't need to implement? – Shai Mar 20 '16 at 11:57
  • 1
    Uhm, you are right. Either I modify the original Layer or it's better to use another format altogether. Any specific reason to prefer HDF5 over LMBD? – Enoon Mar 20 '16 at 12:08
  • @Shai: Does Caffe support images with more than 3 channels as inputs? I mean can I use an input matrix of more than 3 channels (like e.g 20!)? or Am I stuck with 1d and 3d channel images as input? – Hossein Nov 20 '16 at 21:00
  • @Hossein you are not restricted at all. You can have as many channels as you want – Shai Nov 21 '16 at 05:52
  • 1
    @Shai:Thank you very much. by the way why dont you create some caffe tutorial? you seem to know everything about it, and there is not even a single video tutorial about caffe in youtube. if you had the time, this would help a great number of people. anyway I'm really grateful thanks – Hossein Nov 21 '16 at 06:11
  • @Hossein nice idea. however I am more into textual tutorials. you can find some of them here at SO – Shai Nov 21 '16 at 06:25
  • @Shai: Thats great as well. It would be great to have your articles/tutorials on your blog. to be honest, digging and searching among questions for knowing how to do work with caffe is very hard. you miss alot of stuff this way. Anyway I wish you the best man. God bless you – Hossein Nov 21 '16 at 06:28