Store struct containing vector and cv::Mat to disk - Data serialization in C++

Question

I'd like to store the structure below in a disk and be able to read it again: (C++)

struct pixels {
    std::vector<cv::Point> indexes;
    cv::Mat values;
};

I've tried to use ofstream and ifstream but they need the size of the variable which I don't really know how to calculate in this situation. It's not a simple struct with some int and double. Is there any way to do it in C++, preferably without using any third-party libraries.

(I'm actually coming from the Matlab language. It was easy to do it in that language using save: save(filename, variables)).

Edit:
I've just tried Boost Serialization. Unfortunately it's very slow for my use.

`indexes.size()` gives you the amount of elements of the vector and `values.rows * values.cols` gives you the amount of elements of the cv::Mat... You need a little bit of work... but as sugestion, write first a "header" with the size of the vector and rows and cols and type of values... then write all the values of indexes and then all the values of mat. The header will be needed to read it back. You may take a look into [this answer](https://codereview.stackexchange.com/questions/26344/writing-reading-data-structure-to-a-file-using-c) in your case, you may need a little bit more work — api55, Apr 27 '18 at 11:19
Why not take advantage of [OpenCV's `FileStorage`](https://docs.opencv.org/2.4/doc/tutorials/core/file_input_output_with_xml_yml/file_input_output_with_xml_yml.html)? — Dan Mašek, Apr 27 '18 at 17:15
Read through the tutorial I linked to -- it explains how to deal with custom data structures, how to deal with vectors, and how to deal with OpenCV data structures. | Define "fast". — Dan Mašek, Apr 27 '18 at 19:02

score 6 · Accepted Answer · answered Apr 28 '18 at 01:41

Several approaches come to mind with various cons and pros.

Use OpenCV's XML/YAML persistence functionality.
- XML format (portable)
- YAML format (portable)
- JSON format (portable)
Use Boost.Serialization
- Plain text format (portable)
- XML format (portable)
- binary format (non-portable)
Raw data to std::fstream
- binary format (non-portable)

By "portable" I mean that the data files written on an arbitrary platform+compiler can be read on any other platform+compiler. By "non-portable", I mean that's not necessarily the case. Endiannes matters, and compilers could possibly make a difference too. You could add additional handling for such situations at the cost of performance. In this answer, I'll assume you're reading and writing on the same machine.

First here are includes, common data structures and utility functions we will use:

#include <opencv2/opencv.hpp>

#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
#include <boost/archive/text_oarchive.hpp>
#include <boost/archive/text_iarchive.hpp>
#include <boost/archive/xml_oarchive.hpp>
#include <boost/archive/xml_iarchive.hpp>

#include <boost/filesystem.hpp>

#include <boost/serialization/vector.hpp>

#include <chrono>
#include <fstream>
#include <vector>

// ============================================================================

using std::chrono::high_resolution_clock;
using std::chrono::duration_cast;
using std::chrono::microseconds;

namespace ba = boost::archive;
namespace bs = boost::serialization;
namespace fs = boost::filesystem;

// ============================================================================

struct pixels
{
    std::vector<cv::Point> indexes;
    cv::Mat values;
};

struct test_results
{
    bool matches;
    double write_time_ms;
    double read_time_ms;
    size_t file_size;
};

// ----------------------------------------------------------------------------

bool validate(pixels const& pix_out, pixels const& pix_in)
{
    bool result(true);
    result &= (pix_out.indexes == pix_in.indexes);
    result &= (cv::countNonZero(pix_out.values != pix_in.values) == 0);
    return result;
}

pixels generate_data()
{
    pixels pix;
    for (int i(0); i < 10000; ++i) {
        pix.indexes.emplace_back(i, 2 * i);
    }
    pix.values = cv::Mat(1024, 1024, CV_8UC3);
    cv::randu(pix.values, 0, 256);

    return pix;
}

void dump_results(std::string const& label, test_results const& results)
{
    std::cout << label << "\n";
    std::cout << "Matched = " << (results.matches ? "true" : "false") << "\n";
    std::cout << "Write time = " << results.write_time_ms << " ms\n";
    std::cout << "Read time = " << results.read_time_ms << " ms\n";
    std::cout << "File size = " << results.file_size << " bytes\n";
    std::cout << "\n";
}

// ============================================================================

Using OpenCV FileStorage

This is the first obvious choice is to use the serialization functionality OpenCV provides -- cv::FileStorage, cv::FileNode and cv::FileNodeIterator. There's a nice tutorial in the 2.4.x documentation, which I can't seem to find right now in the new docs.

The advantage here is that we already have support for cv::Mat and cv::Point, so there's very little to implement.

However, all the formats provided are textual, so there will be a fairly large cost in reading and writing the values (especially for the cv::Mat). It may be advantageous to save/load the cv::Mat using cv::imread/cv::imwrite and serialize the filename. I'll leave this to the reader to implement and benchmark.

// ============================================================================

void save_pixels(pixels const& pix, cv::FileStorage& fs)
{
    fs << "indexes" << "[";
    for (auto const& index : pix.indexes) {
        fs << index;
    }
    fs << "]";
    fs << "values" << pix.values;
}

void load_pixels(pixels& pix, cv::FileStorage& fs)
{
    cv::FileNode n(fs["indexes"]);
    if (n.type() != cv::FileNode::SEQ) {
        throw std::runtime_error("Input format error: `indexes` is not a sequence.");;
    }

    pix.indexes.clear();
    cv::FileNodeIterator it(n.begin()), it_end(n.end());
    cv::Point pt;
    for (; it != it_end; ++it) {
        (*it) >> pt;
        pix.indexes.push_back(pt);
    }

    fs["values"] >> pix.values;
}

// ----------------------------------------------------------------------------

test_results test_cv_filestorage(std::string const& file_name, pixels const& pix)
{
    test_results results;
    pixels pix_in;

    high_resolution_clock::time_point t1 = high_resolution_clock::now();
    {
        cv::FileStorage fs(file_name, cv::FileStorage::WRITE);

        save_pixels(pix, fs);
    }
    high_resolution_clock::time_point t2 = high_resolution_clock::now();
    {
        cv::FileStorage fs(file_name, cv::FileStorage::READ);

        load_pixels(pix_in, fs);
    }
    high_resolution_clock::time_point t3 = high_resolution_clock::now();

    results.matches = validate(pix, pix_in);
    results.write_time_ms = static_cast<double>(duration_cast<microseconds>(t2 - t1).count()) / 1000;
    results.read_time_ms = static_cast<double>(duration_cast<microseconds>(t3 - t2).count()) / 1000;
    results.file_size = fs::file_size(file_name);

    return results;
}

// ============================================================================

Using Boost Serialization

Another potential approach is to use Boost.Serialization library, as you mention you have tried. We have three options here on the archive format, two of which are textual (and portable), and one is binary (non-portable, but much more efficient).

There's more work to do here. We need to provide good serialization for cv::Mat, cv::Point and our pixels structure. Support for std::vector is provided, and to handle XML, we need to generate key-value pairs.

In case of the two textual formats, it may again be advantageous to save the cv::Mat as an image, and only serialize the path. The reader is free to try this approach. For binary format it would most likely be a tradeoff between space and time. Again, feel free to test this (you could even use cv::imencode and imdecode).

// ============================================================================

namespace boost { namespace serialization {

template<class Archive>
void serialize(Archive &ar, cv::Mat& mat, const unsigned int)
{
    int cols, rows, type;
    bool continuous;

    if (Archive::is_saving::value) {
        cols = mat.cols; rows = mat.rows; type = mat.type();
        continuous = mat.isContinuous();
    }

    ar & boost::serialization::make_nvp("cols", cols);
    ar & boost::serialization::make_nvp("rows", rows);
    ar & boost::serialization::make_nvp("type", type);
    ar & boost::serialization::make_nvp("continuous", continuous);

    if (Archive::is_loading::value)
        mat.create(rows, cols, type);

    if (continuous) {
        size_t const data_size(rows * cols * mat.elemSize());
        ar & boost::serialization::make_array(mat.ptr(), data_size);
    } else {
        size_t const row_size(cols * mat.elemSize());
        for (int i = 0; i < rows; i++) {
            ar & boost::serialization::make_array(mat.ptr(i), row_size);
        }
    }
}

template<class Archive>
void serialize(Archive &ar, cv::Point& pt, const unsigned int)
{
    ar & boost::serialization::make_nvp("x", pt.x);
    ar & boost::serialization::make_nvp("y", pt.y);
}

template<class Archive>
void serialize(Archive &ar, ::pixels& pix, const unsigned int)
{
    ar & boost::serialization::make_nvp("indexes", pix.indexes);
    ar & boost::serialization::make_nvp("values", pix.values);
}

}}

// ----------------------------------------------------------------------------

template <typename OArchive, typename IArchive>
test_results test_bs_filestorage(std::string const& file_name
    , pixels const& pix
    , bool binary = false)
{
    test_results results;
    pixels pix_in;

    high_resolution_clock::time_point t1 = high_resolution_clock::now();
    {
        std::ios::openmode mode(std::ios::out);
        if (binary) mode |= std::ios::binary;
        std::ofstream ofs(file_name.c_str(), mode);
        OArchive oa(ofs);

        oa & boost::serialization::make_nvp("pixels", pix);
    }
    high_resolution_clock::time_point t2 = high_resolution_clock::now();
    {
        std::ios::openmode mode(std::ios::in);
        if (binary) mode |= std::ios::binary;
        std::ifstream ifs(file_name.c_str(), mode);
        IArchive ia(ifs);

        ia & boost::serialization::make_nvp("pixels", pix_in);
    }
    high_resolution_clock::time_point t3 = high_resolution_clock::now();

    results.matches = validate(pix, pix_in);
    results.write_time_ms = static_cast<double>(duration_cast<microseconds>(t2 - t1).count()) / 1000;
    results.read_time_ms = static_cast<double>(duration_cast<microseconds>(t3 - t2).count()) / 1000;
    results.file_size = fs::file_size(file_name);

    return results;
}

// ============================================================================

Raw Data to `std::fstream`

If we don't care about portability of the data files, we can just do the minimal amount of work to dump and restore the memory. With some effort (at the cost of speed) you could make this more flexible.

// ============================================================================

void save_pixels(pixels const& pix, std::ofstream& ofs)
{
    size_t index_count(pix.indexes.size());
    ofs.write(reinterpret_cast<char const*>(&index_count), sizeof(index_count));
    ofs.write(reinterpret_cast<char const*>(&pix.indexes[0]), sizeof(cv::Point) * index_count);

    int cols(pix.values.cols), rows(pix.values.rows), type(pix.values.type());
    bool continuous(pix.values.isContinuous());

    ofs.write(reinterpret_cast<char const*>(&cols), sizeof(cols));
    ofs.write(reinterpret_cast<char const*>(&rows), sizeof(rows));
    ofs.write(reinterpret_cast<char const*>(&type), sizeof(type));
    ofs.write(reinterpret_cast<char const*>(&continuous), sizeof(continuous));

    if (continuous) {
        size_t const data_size(rows * cols * pix.values.elemSize());
        ofs.write(reinterpret_cast<char const*>(pix.values.ptr()), data_size);
    } else {
        size_t const row_size(cols * pix.values.elemSize());
        for (int i(0); i < rows; ++i) {
            ofs.write(reinterpret_cast<char const*>(pix.values.ptr(i)), row_size);
        }
    }
}

void load_pixels(pixels& pix, std::ifstream& ifs)
{
    size_t index_count(0);
    ifs.read(reinterpret_cast<char*>(&index_count), sizeof(index_count));
    pix.indexes.resize(index_count);
    ifs.read(reinterpret_cast<char*>(&pix.indexes[0]), sizeof(cv::Point) * index_count);

    int cols, rows, type;
    bool continuous;

    ifs.read(reinterpret_cast<char*>(&cols), sizeof(cols));
    ifs.read(reinterpret_cast<char*>(&rows), sizeof(rows));
    ifs.read(reinterpret_cast<char*>(&type), sizeof(type));
    ifs.read(reinterpret_cast<char*>(&continuous), sizeof(continuous));

    pix.values.create(rows, cols, type);

    if (continuous) {
        size_t const data_size(rows * cols * pix.values.elemSize());
        ifs.read(reinterpret_cast<char*>(pix.values.ptr()), data_size);
    } else {
        size_t const row_size(cols * pix.values.elemSize());
        for (int i(0); i < rows; ++i) {
            ifs.read(reinterpret_cast<char*>(pix.values.ptr(i)), row_size);
        }
    }
}

// ----------------------------------------------------------------------------

test_results test_raw(std::string const& file_name, pixels const& pix)
{
    test_results results;
    pixels pix_in;

    high_resolution_clock::time_point t1 = high_resolution_clock::now();
    {
        std::ofstream ofs(file_name.c_str(), std::ios::out | std::ios::binary);

        save_pixels(pix, ofs);
    }
    high_resolution_clock::time_point t2 = high_resolution_clock::now();
    {
        std::ifstream ifs(file_name.c_str(), std::ios::in | std::ios::binary);

        load_pixels(pix_in, ifs);
    }
    high_resolution_clock::time_point t3 = high_resolution_clock::now();

    results.matches = validate(pix, pix_in);
    results.write_time_ms = static_cast<double>(duration_cast<microseconds>(t2 - t1).count()) / 1000;
    results.read_time_ms = static_cast<double>(duration_cast<microseconds>(t3 - t2).count()) / 1000;
    results.file_size = fs::file_size(file_name);

    return results;
}

// ============================================================================

Complete `main()`

Let's run all the tests for the various approaches and compare the results.

Code:

// ============================================================================

int main()
{
    namespace ba = boost::archive;

    pixels pix(generate_data());

    auto r_c_xml = test_cv_filestorage("test.cv.xml", pix);
    auto r_c_yaml = test_cv_filestorage("test.cv.yaml", pix);
    auto r_c_json = test_cv_filestorage("test.cv.json", pix);

    auto r_b_txt = test_bs_filestorage<ba::text_oarchive, ba::text_iarchive>("test.bs.txt", pix);
    auto r_b_xml = test_bs_filestorage<ba::xml_oarchive, ba::xml_iarchive>("test.bs.xml", pix);
    auto r_b_bin = test_bs_filestorage<ba::binary_oarchive, ba::binary_iarchive>("test.bs.bin", pix, true);

    auto r_b_raw = test_raw("test.raw", pix);

    // ----

    dump_results("OpenCV - XML", r_c_xml);
    dump_results("OpenCV - YAML", r_c_yaml);
    dump_results("OpenCV - JSON", r_c_json);
    dump_results("Boost - TXT", r_b_txt);
    dump_results("Boost - XML", r_b_xml);
    dump_results("Boost - Binary", r_b_bin);
    dump_results("Raw", r_b_raw);

    return 0;
}

// ============================================================================

Console output (i7-4930k, Win10, MSVC 2013)

NB: We're testing this with 10000 indexes and values being a 1024x1024 BGR image.

OpenCV - XML
Matched = true
Write time = 257.563 ms
Read time = 257.016 ms
File size = 12323677 bytes

OpenCV - YAML
Matched = true
Write time = 135.498 ms
Read time = 311.999 ms
File size = 16353873 bytes

OpenCV - JSON
Matched = true
Write time = 137.003 ms
Read time = 312.528 ms
File size = 16353873 bytes

Boost - TXT
Matched = true
Write time = 1293.84 ms
Read time = 1210.94 ms
File size = 11333696 bytes

Boost - XML
Matched = true
Write time = 4890.82 ms
Read time = 4042.75 ms
File size = 62095856 bytes

Boost - Binary
Matched = true
Write time = 12.498 ms
Read time = 4 ms
File size = 3225813 bytes

Raw
Matched = true
Write time = 8.503 ms
Read time = 2.999 ms
File size = 3225749 bytes

Conclusion

Looking at the results, the textual Boost.Serialization formats are abhorently slow -- I see what you meant. Saving values separately would definitely bring significant benefit here. The binary approach is quite good if portability is not an issue. You could still fix that at a reasonable cost.

OpenCV performs much better, XML being balanced on reads and writes, YAML/JSON (apparently identical) being faster on writes, but slower on reads. Still rather sluggish, so writing values as an image and saving filename might still be of benefit.

The raw approach is the fastest (no surprise), but also inflexible. You could make some improvements, of course, but it seems to need a lot more code than using a binary Boost.Archive -- not really worth it here. Still, if you're doing everything on the same machine, this may do the job.

Personally I'd go for the binary Boost approach, and tweak it if you need cross-platform capability.

I can't see why `Raw` is not portable to other environments. Suppose I created some files on windows using `fstream`. Then on another os, say linux, as long as I can run the code (i.e. I have `g++`), I can read those files with no problem. Do I miss something here? — smttsp, Sep 15 '19 at 10:36
@smttsp Things like endianness, [packing/alignment](https://stackoverflow.com/questions/5397447/struct-padding-in-c), differing sizes of fundamental types (e.g. `long` on 64bit windows vs 64bit linux). — Dan Mašek, Sep 16 '19 at 10:55

Store struct containing vector and cv::Mat to disk - Data serialization in C++

1 Answers1

Using OpenCV FileStorage

Using Boost Serialization

Raw Data to std::fstream

Complete main()

Conclusion

Raw Data to `std::fstream`

Complete `main()`