0

I'm sending images over the network (from Python) and want to create OpenCV Mats from them on the receiving end (in C++).

They are created like this:

image = self.camera.capture_image()   # np.array of dtype np.uint8
h, w, c = image.shape   # 4 channels
image = np.transpose(image, (2, 0, 1)) # transpose because channels come first in OpenCV (?)
image = np.ascontiguousarray(image, dtype='>B')  # big-endian bytes
bytess = image.tobytes(order='C')

After this, I should have an array where the 3 dimensions are flattened such that individual rows are appended together for each channel and then the channels are appended to form the final byte buffer. I have verified that my understanding is correct and the following holds

bytess[channel*height*width + i*wwidth + j] == image[channel, i, j]

[I think the above part is actually unimportant, because if it's incorrect, I will get an incorrectly displayed image, but at least I would have an image, which is one step further than I am now.]

Now on the other side I am trying to do this:

char* pixel_data = … // retrieve array of bytes from message
// assume height, width and channels are known
const int sizes[3] = {channels, width, height};
const size_t steps[3] = {(size_t)height * (size_t)width, (size_t)height};
cv::Mat image(3, sizes, CV_8UC1, pixel_data, steps);

So, I create a Matrix with three dimensions where the element type is byte. I am not so sure I'm determining the steps correctly, but I think it matches the documentation.

But running this just crashes with

error: (-5:Bad argument) Unknown array type in function 'cvarrToMat'

What is the correct way to serialise an RGBA (or BGRA for OpenCV) image to a byte buffer and create a cv::Mat from it with the C++ API?

oarfish
  • 4,116
  • 4
  • 37
  • 66
  • I will think about this later on, but for the moment, I did something related that may give you an idea here... https://stackoverflow.com/a/55313342/2836621 – Mark Setchell May 29 '19 at 10:23
  • Are you sure you have the network bandwidth to do this? Even a 640x480 RGB image is nearly 1MB, and a 100Mb/s Ethernet can maximally deliver 8MB/s, so is 8 frames per second enough? Or do you need to JPEG encode and get 20x that? – Mark Setchell May 29 '19 at 10:57
  • @MarkSetchell It's being sent over localhost, so that issue is not important right now. I am purely interested in getting the deserialization to work. – oarfish May 29 '19 at 11:03
  • Where is your sending and receiving code? – Mark Setchell May 29 '19 at 11:32
  • Why all the shenanigans, the in-memory layout of numpy arrays and `cv::Mat` is identical, you just need to take the underlying data buffer and make a `cv::Mat` header for it with the same dimensions and element data type. Minimal overhead, that's how the Python OpenCV bindings work. – Dan Mašek May 29 '19 at 12:55
  • Interesting, did not know that. So in principle, just dumping `np.ascontiguousarray(image).tobytes()` should be enough? – oarfish May 29 '19 at 13:36
  • Yeah, I'd say so. I'm not sure what exactly `self.camera` is, but there is a pretty good chance the the array is already contiguous as well. Also, if the images are coming from an actual camera, then you could safely ditch the alpha layer and reduce the amount of data you need to send. – Dan Mašek May 29 '19 at 18:11

1 Answers1

0

I have one solution, which circumvents the problem. This line here:

cv::Mat image(3, sizes, CV_8UC1, pixel_data, steps);

makes the assumption that I can pass sizes of three dimensions with individual bytes, but I could not make this work.

Instead using a different constructor

cv::Mat image(height, width CV_8UC4, pixel_data);

I can treat the image as two-dimensional but with a vector-datatype (4 bytes element size instead of scalar bytes). If the pixel_data pointer is in the correct layout, this works.

The correct layout is not really explicitly documented, but can be deduced from one of the official tutorials

enter image description here

So the data is stored such that one row comes after the other and each element of a row is split into n_channels elements. Using a data type such as CV_8UC4 makes the matrix read 4 bytes at each position in the raw data array, and advance the pointer 4 bytes.

So in this case, I just have to rearrange the numpy array into the appropriate sequence: append rows together, but interleave the channels. I did this like so, but I hope there's a way without looping.

def array_to_cv_bytes(ary):
    assert ary.ndim == 3, 'Array must have 3 dimensions'
    h, w, c = ary.shape
    ary = ary[..., (2, 1, 0, 3)]
    output = np.empty(h * c * w, dtype=np.uint8)
    for channeld_idx in range(c):
        output[channeld_idx::c] = ary[..., channeld_idx].reshape(h*w)
    return output.tobytes(order='C')
oarfish
  • 4,116
  • 4
  • 37
  • 66
  • "The correct layout is not really explicitly documented" -- it is, right at the beginning of [`cv::Mat` documentation](https://docs.opencv.org/4.0.0/d3/d63/classcv_1_1Mat.html#details)... – Dan Mašek May 29 '19 at 12:45
  • Yes correct, I read that as well, but that didn't help me in going the other way with multiple channels. Perhaps I just don't understand it. – oarfish May 29 '19 at 13:27
  • I think what mainly confused me was the additional complication of the element size, which may not be scalar and which the addressing formula neglects (I guess it's implicit in the `+` operator like in C) – oarfish May 29 '19 at 15:22