7

So boost provides nice spatial indexing capabilities in the form of R-trees. This is neat, however it doesn't seem possible yet to serialize the tree once it is built, am I wrong?

The usual "out_archive << rtree" syntax doesn't work because rtree doesn't have a serialize() member. In boost 1.57 there seems to be some experimental code for that, e.g. /boost/geometry/index/detail/serialization.hpp, but it actually doesn't seem to compile!

So first question : does someone know how to serialize an R-tree with boost?

If not, then my second question : how would you go about to permanently store your index on disk to avoid having to rebuild it every time? (I have a dataset of 145M entries and it takes several hours to build the index so I really don't want to have to build it more than once!)

genpfault
  • 51,148
  • 11
  • 85
  • 139
jerorx
  • 568
  • 6
  • 19
  • 1
    See http://stackoverflow.com/questions/13599557/persistent-disk-based-r-tree-or-r-tree for a similar question. I think it makes no sense to add persistence to a memory-based R-Trees implementation if we are talking about 145M entries. Instead, use an implementation designed for persistent storage of index infos. – TheBlastOne Jan 14 '15 at 14:36
  • Another idea would be to create a virtual heap (using a disk file as virtual storage), and map all calls for allocation of, access to and deallocation of R-Tree items to the appropriate virtual heap API. The "swap" file of that virtual memory heap then would be your persistent index, and during runtime, you could hold as many entries in-memory as possible, caching the swapfile as memory is available. – TheBlastOne Jan 14 '15 at 14:38

1 Answers1

6
  1. Packing Algorithm & Bulk Loading

    It's possible to load packs (using the Packing algorithm).

    Additionally there are also algorithms creating R-tree containing some, number of objects. This technique is called bulk loading and is done by use of packing algorithm [5] [6]. This method is faster and results in R-trees with better internal structure. This means that the query performance is increased.

    [5] Leutenegger, Scott T.; Edgington, Jeffrey M.; Lopez, Mario A. (1997). STR: A Simple and Efficient Algorithm for R-Tree Packing

    [6] Garcia, Yvan J.; Lopez, Mario A.; Leutenegger, Scott T. (1997). A Greedy Algorithm for Bulk Loading R-trees

    More details: http://www.boost.org/doc/libs/1_57_0/libs/geometry/doc/html/geometry/spatial_indexes/introduction.html

  2. Using memory mapped files

    You can use a memory mapped file with custom allocators. This way you can use any representation you wish and it will automatically persist

    More details: http://www.boost.org/doc/libs/1_57_0/libs/geometry/doc/html/geometry/spatial_indexes/rtree_examples/index_stored_in_mapped_file_using_boost_interprocess.html

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Memory mapped files seem to do what I want, thanks! The resulting files are big, but they lend themselves very well to compression so it's ok. – jerorx Jan 15 '15 at 11:16
  • Heh. That's slightly confusing seeing that you cannot memory mapped a compressed file transparently. I guess you want to save (multiple) offline copies then. – sehe Jan 15 '15 at 11:21