3

I'm a little bit confused about the terminology of Boost Library (for windows). What I'm trying to do is simply; create a file on disk (a big file >50 GB) do some mapping for write and read operations seperately.

For example first map 1 gb portion for writing & after that flush it to the hard drive take a new portion and so on, while the reader applications maps different parts of the file and do the reading stuff without changing anything (no edit).

I'm reading the documentation of boost (1.47.0 version since we allowed to use this one) and I don't understand exactly when to use Memory Mapped Files methods like: file_mapping, managed_region and Managed Map File: basic_managed_mapped_file and Offset_Ptr for instance.

Can anyone please tell me what is the difference between Memory Mapped Files and Managed Mapped File and what are the usages of them?

Some example codes would be highly apopriciated about these and Offset_ptr as well if possible.

Thank you indeed...

user2955554
  • 309
  • 3
  • 14

1 Answers1

4

You can use the managed_mapped_file to transparently allocate from a memory mapped file.

This means that for all practical purposes you often don't need to dubdivide your memory areas. It's all virtual memory anyways, so paging takes care of loading the right bits at the required times.

Obviously, if there's a lot of fragmentation or accesses "jumping around" then paging might become a performance bottleneck. In that case, consider subdividing into pools and allocate from those.)_

Edit Just noticed Boost IPC has support for this under Segregated storage node allocators and Adaptive pool node allocators. There are also notes about the implementation of these storage pools here.

Here's a simple starting point that creates a 50Gb file and stuffs some data in it:

#include <iostream>
#include <string>
#include <vector>
#include <iterator>
#include <algorithm>

#include <boost/container/flat_map.hpp>
#include <boost/container/flat_set.hpp>

#include <boost/interprocess/managed_mapped_file.hpp>
#include <boost/container/scoped_allocator.hpp>
#include <boost/interprocess/containers/string.hpp>
#include <boost/interprocess/containers/vector.hpp>
#include <boost/interprocess/sync/named_mutex.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>

namespace bip = boost::interprocess;
using mutex_type    = bip::named_mutex;

struct X
{
    char buf[100];
    double rate;
    uint32_t samples[1024];
};

template <typename T> using shared_alloc  = bip::allocator<T,bip::managed_mapped_file::segment_manager>;
template <typename T> using shared_vector = boost::container::vector<T, shared_alloc<T> >;
template <typename K, typename V, typename P = std::pair<K,V>, typename Cmp = std::less<K> >
                      using shared_map    = boost::container::flat_map<K, V, Cmp, shared_alloc<P> >;

using shared_string = bip::basic_string<char,std::char_traits<char>,shared_alloc<char> >;
using dataset_t     = shared_map<shared_string, shared_vector<X> >;

struct mutex_remove
{
    mutex_remove() { mutex_type::remove("7FD6D7E8-320B-11DC-82CF-39598D556B0E"); }
    ~mutex_remove(){ mutex_type::remove("7FD6D7E8-320B-11DC-82CF-39598D556B0E"); }
} remover;

static mutex_type mutex(bip::open_or_create,"7FD6D7E8-320B-11DC-82CF-39598D556B0E");

static dataset_t& shared_instance()
{
    bip::scoped_lock<mutex_type> lock(mutex);
    static bip::managed_mapped_file seg(bip::open_or_create,"./demo.db", 50ul<<30); // "50Gb ought to be enough for anyone"

    static dataset_t* _instance = seg.find_or_construct<dataset_t>
        ("DATA")
        (
         std::less<shared_string>(), 
         dataset_t::allocator_type(seg.get_segment_manager())
        );

    static auto capacity = seg.get_free_memory();
    std::cerr << "Free space: " << (capacity>>30) << "g\n";

    return *_instance;
}

int main()
{
    auto& db = shared_instance();

    bip::scoped_lock<mutex_type> lock(mutex);
    auto alloc = db.get_allocator().get_segment_manager();

    std::cout << db.size() << '\n';

    for (int i = 0; i < 1000; ++i)
    {
        std::string key_ = "item" + std::to_string(i);
        shared_string key(alloc);
        key.assign(key_.begin(), key_.end());
        auto value = shared_vector<X>(alloc);
        value.resize(size_t(rand()%(1ul<<9)));
        auto entry = std::make_pair(key, value);

        db.insert(std::make_pair(key, value));
    }
}

Note that it writes a sparse file of 50G. The actual size commited depends on a bit of random there. My run resulted in a roughly 1.1G:

$ du -shc --apparent-size demo.db 
50G demo.db

$ du -shc demo.db 
1,1G    demo.db

Hope this helps

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks for the answer, but I'm not sure that I understand all clearly because its a little bit "high level" for me. So basicly what I'm trying to do is constructing a buffer for 1 writer application and 8 reader applications working simultaniously on the sam machine. The writer puts data like 12 MB/s and the reader applications do some search and read operations over the data on buffer. So I thought that creating a file on disk map some portion to RAM for writing, after fillling this portion flush it to disk. But a little bit confused about the "how" part. Am i on the right way at least? – user2955554 Jan 09 '14 at 07:55
  • Just an addition: Also the reader apps shall do some search over the data on buffer so they might do some mapping oprations on their address spaces as well. Do i need those Segregated storage node allocators and Adaptive pool node allocators that you told above? – user2955554 Jan 09 '14 at 07:59
  • Multiple processes can map the same file (area) into their process space _at the same time_. This is why you need synchronization (the named mutex in my sample). There is no need to flush/read in order for the other process to see it. The file backing is (just) for persistence. – sehe Jan 09 '14 at 08:04