7

Functional programming languages often work on immutable data structures but stay efficient by structural sharing. E.g. you work on some map of information, if you insert an element, you will not modify the existing map but create a new updated version. To avoid massive copying and memory usage, the map will share (as good as possible) the unchanged data between both instances.

I would be interested if there exists some template library providing such a map like data structure for C++. I searched a bit and found nothing, beside internal classes in LLVM.

Ivaylo Strandjev
  • 69,226
  • 18
  • 123
  • 176
Cullmann
  • 71
  • 4
  • Do not quite understand your last line: "To avoid massive copying and memory usage, the map will share (as good as possible) the unchanged data between both instances." This means that the structure itself is mutable, it only changes for new additions / modifications, but the entire memory footprint (including starting address) remains same. That is very much the case for mutables. Is it that you want mutable data structure? – Nirav Bhatt Feb 03 '13 at 18:54
  • 1
    @NiravBhatt: No, he means immutable data structures. – Ira Baxter Feb 03 '13 at 19:10
  • @Cullman: dunno if you will find any, but such structures are relatively easy to build. Given whatever existing structure you have, when a "change" is propose, simply build new structure for the part that must change, and use pointers to the old part of the structure that can be preserved. – Ira Baxter Feb 03 '13 at 19:11
  • What is wrong with those internal classes in LLVM that you found? In general purpose libraries I haven't met such immutable containers, also several other useful container types are rare (for example trie is rare and even such mundane thing like ring-buffer). – Öö Tiib Feb 03 '13 at 19:19
  • @Ira: Actually I researched a bit and no, I object, efficient & error free structural sharing data structures are not that easy to build. – Cullmann Feb 03 '13 at 20:06
  • @Öö: The LLVM data structures need to much from the other LLVM internals, not self-contained enough. And if I factor them out, I will need to resync on any change/bugfix. – Cullmann Feb 03 '13 at 20:07
  • 1
    I don't know of any existing implementations in C++, but in case you haven't come across this, the Wikipedia article on [Hash Array Mapped Trie](http://en.wikipedia.org/wiki/Hash_array_mapped_trie) mentions a couple of implementations, and links to the Clojure implementation which is written in reasonably standalone Java (not short but tractable and battle tested). Might serve as a starting point. – unthought Feb 03 '13 at 20:54
  • I also recommend looking at the very similar question "[Functional data structures in C++](http://stackoverflow.com/q/2757278/1067887)". Although it doesn't have a good answer either, it does have a very good (short) discussion in the comments on the question (Pascal Cuoq and sigfpe). – unthought Feb 03 '13 at 21:08
  • I'd like to see one in straight C. I would love to have Clojure's map/dict in Python. – MRocklin Feb 10 '13 at 20:39

2 Answers2

3

A Copy On Write b+tree sounds like what your looking for. It basically creates a new snapshot of itself every time it gets modified but it shares unmodified leaf nodes between versions. Most of the implementations I've seen tend to be baked into append only database log files. CouchDB has a very nice write up on them. They are however "relatively easy", as far as map data structures go, to implement.

blaise
  • 307
  • 1
  • 5
  • That's a good hint! Actually even if easy to implement, I would favor some already well tested and proven in practice variant to avoid massive own testing. – Cullmann Feb 04 '13 at 09:33
  • Copy-On-Write B-Tree was beaten by Stratified B-Tree http://www.slideshare.net/acunu/20110620-stratifiedbtree – Brian Cannard Apr 26 '16 at 11:06
0

You can use an ordinary map, but marking every element with a timestamp or "map version number". If you want to remove elements too, use two marks. If you might reinsert removed elements, then you need a list of values and pairs of marks per element.

For example, you search for the key "foo", and you find that it had the value 5 in versions 0 to 3 (included), then it was "removed", and then it had the value -8 in versions 9 to current.

This eats a lot of memory and time, though.

comocomocomocomo
  • 4,772
  • 2
  • 17
  • 17
  • Actually, as you mention yourself, this won't help me a lot, as it would remove the benefits I want, space + time efficiency. – Cullmann Feb 04 '13 at 09:33