Read and write large object to and from file

Question

I use opencv and surf descriptor, for each image i get a descriptor matrix, then I create a big matrix (Mat) for use flann index search. the problem is that when big matrix size grow (over 30 gb) I will not have enough free memory. now I serialize matrix to binary file, and I have one solution: I limite max size and split it into several files, so after every search operation, I read files by queue and then combine the results. is there more effincey solution to read it once?

//code

   class FlannIndexModel
{
    cv::Ptr<cv::flann::Index> flannIndex;
    cv::Mat dbDescs;
    string fileName;
public:
    vector<IndecesMappingModel> imap;

    FlannIndexModel(string fileName)
    {
        this->fileName = fileName;
        flannIndex = new cv::flann::Index();
    }

    size_t Size()
    {
        size_t sizeInBytes = dbDescs.total() * dbDescs.elemSize();
        return sizeInBytes/1000000;
    }
    void Load()
    {
        FileManager::LoadMat(dbDescs, (fileName + "_desc.bin"));
        FileManager::LoadImap(imap, (fileName + "_imap.bin"));
        flannIndex->load(dbDescs, (fileName + "_flann.bin"));

        cout << " Flann Load: " << " dbDescs rows= " << dbDescs.rows << " imap= " << imap.size() << endl;
    }
    void Save()
    {
        FileManager::SaveMat(dbDescs, (fileName + "_desc.bin"));
        FileManager::SaveImap(imap, (fileName + "_imap.bin"));
        flannIndex->save((fileName + "_flann.bin"));
    }
    void Add(vector<ImageDescModel> imges)
    {
        vector<cv::Mat> descs;

        int r = dbDescs.rows;

        for (int i = 0; i < imges.size(); i++)
        {
            auto desc = imges[i].Desc;
            if (desc.empty())
                continue;
            descs.push_back(desc);
            imap.push_back(IndecesMappingModel(imges[i].FileName, r, r + desc.rows - 1));
            r += desc.rows;
        }
        if (!dbDescs.empty())
            descs.push_back(dbDescs);
        vconcat(descs, dbDescs);
    }
    void Calcul()
    {
        flannIndex->build(dbDescs, cv::flann::KDTreeIndexParams::KDTreeIndexParams(4));
    }


    vector<IndecesMappingModel> Search(cv::Mat queryDescriptors, int num)
    {
        for (auto &img : imap)
        {
            img.Similarity = 0;
        }

        cv::Mat indices(queryDescriptors.rows, 2, CV_32S);
        cv::Mat dists(queryDescriptors.rows, 2, CV_32F);
        flannIndex->knnSearch(queryDescriptors, indices, dists, 2, cv::flann::SearchParams(24));

#pragma omp for
        for (int i = 0; i < indices.rows; i++)
        {
            if (dists.at<float>(i, 0) < (0.6 * dists.at<float>(i, 1)))
            {
                for (auto &img : imap)
                {
                    if (img.IndexStart <= indices.at<int>(i, 0) && img.IndexEnd >= indices.at<int>(i, 0))
                    {
                        img.Similarity++;
                        break;
                    }
                }
            }
        }

        std::sort(imap.begin(), imap.end());

        if (imap.size() > num)
        {
            vector<IndecesMappingModel> result(imap.begin(), imap.begin() + num);
            return result;
        }
        else
        {
            return imap;
        }
    }

};

I'd suggest that you have a look at [Efficiently reading a very large text file in C++](http://stackoverflow.com/questions/26736742/efficiently-reading-a-very-large-text-file-in-c) and [Fast textfile reading in c++](http://stackoverflow.com/questions/17925051/fast-textfile-reading-in-c). I find the `mmap()` technique (second link) quite interesting. You can read more on that on [Beej's Guide to Unix IPC](http://beej.us/guide/bgipc/output/html/multipage/mmap.html) — maddouri, Oct 10 '15 at 23:19
Boost seems to provide a portable way of doing memory-mapped file IO: [source_1](http://www.boost.org/doc/libs/1_59_0/libs/iostreams/doc/classes/mapped_file.html) [source_2](http://www.boost.org/doc/libs/1_59_0/doc/html/interprocess/sharedmemorybetweenprocesses.html#interprocess.sharedmemorybetweenprocesses.mapped_file) — maddouri, Oct 10 '15 at 23:29
the problem is that I need to load full Mat to pass it to opencv function. do you mean that I need to map all my files (matrixs), then when I start search then I load files not from disk directly ? — Karim Alabtah, Oct 11 '15 at 09:42
Not sure about "ssd disk as RAM", you might want to have a look at this [article talking about the linux _swap_](http://www.thegeekstuff.com/2010/08/how-to-add-swap-space/). Concerning your previous comment, I'd suggest that you don't split your matrix: Just save the whole thing, then `mmap()` it to some pointer, which you "wrap" in a `cv::Mat` [source_1](http://answers.opencv.org/question/8202/using-external-image-data-in-a-cvmat/) [source_2](http://stackoverflow.com/q/22461003/865719). — maddouri, Oct 11 '15 at 10:00
Could you please update your original post by adding some code so we can get a clearer view on the problem ? — maddouri, Oct 11 '15 at 10:01

Read and write large object to and from file

0 Answers0