28

I am looking for a lightweight open source paging B+ tree implementation that uses a disk file for storing the tree.

So far I have found only memory-based implementations, or something that has dependency on QT (?!) and does not even compile.

Modern C++ is preferred, but C will do too.

I prefer to avoid full embeddable DBMS solution, because: 1) for my needs bare bone index that can use the simplest possible disk file organization is enough, no need for concurrency, atomicity and everything else. 2) I am using this to prototype my own index, and most likely will change some of the algorithms and storage layout. I want to do that with a minimum of effort. It's not going to be production code.

Jonas
  • 121,568
  • 97
  • 310
  • 388
Laurynas Biveinis
  • 10,547
  • 4
  • 53
  • 66
  • 1
    Did you find any implementation. Because I have the same needs as yours. Also I cannot use present DBMS solutions because of dependencies. – Jannat Arora Feb 16 '13 at 20:48
  • @JannatArora, I ended up writing my own (incomplete; insertions and queries only) B+-tree on the top of http://libspatialindex.github.com/ disk I/O routines – Laurynas Biveinis Feb 17 '13 at 09:16

7 Answers7

9

http://people.csail.mit.edu/jaffer/WB.

You can also consider re-using the B-Tree implementations from an open source embeddable database. (BDB, SQLite etc)

Vijay Mathew
  • 26,737
  • 4
  • 62
  • 93
4

My own implementation is under http://www.die-schoens.de/prg License is Apache. Its disk-based, maps to shared memory where it also can do locking (i. e. multiuser), file format protects against crash etc. All of the above can be easily switched off (compile or runtime if you like). So bare bone would be almost ANSI-C, basically caching in ones own memory and not locking at all. Test program is included. Currently, it only deals with fixed-size fields, but I am working on that...

3

Faircom's C-Tree Plus has been available commercially for over 20 years. Don't work for them etc... FairCom

There is also Berkley DB which was bought by Oracle but is still free from their site.

AnthonyLambert
  • 8,768
  • 4
  • 37
  • 72
3

I second the suggestion for Berkeley DB. I used it before it was bought by Oracle. It is not a full relational database, but just stores key-value pairs. We switched to that after writing our own paging B-Tree implementation. It was a good learning experience, but we kept adding features until is was just a (poorly) implemented version of BDB.

If you want to do it yourself, here is an outline of what we did. We used mmap to map pages into memory. The structure of each page was index based, so with the page start address you could access any element on the page. Then we mapped and unmapped pages as necessary. We were indexing multi GB text files, back when 1 GB of main memory was considered a lot.

KeithB
  • 16,577
  • 3
  • 41
  • 45
1

I' pretty sure it's not the solution you're looking but why don't you store the tree in a file yourself? All you need is an approach for serialization and an if/ofstream.

Basically you could serialize it like that: go to root, write '0' in your file a divider like '|', the number of elements in root and then all root elements. Repeat with '1' for level 1 and so on. As long as you don't change the level keep the level index, empty leafs could look like 2|0.

DaClown
  • 4,171
  • 6
  • 31
  • 31
  • 1
    I am not looking to build the tree in a memory and then serialize it. I am looking to build the tree on disk, having only a subset of its nodes in memory at any given time. – Laurynas Biveinis Nov 12 '09 at 08:48
  • You're looking for paging a B-tree then. Maybe you could clarify this in your question? – DaClown Nov 12 '09 at 10:13
  • 1
    It's the first time I hear that it is considered "paging" tree, but here you go. – Laurynas Biveinis Nov 12 '09 at 10:50
  • 1
    It's not paging tree. What you have is a data structure of which you want to handle a subset in memory but have the whole structure stored on hard drive. Loading subsets of data that is greater than your memory is called paging. – DaClown Nov 12 '09 at 14:53
1

You could look at Berkeley DB, its supported ny Oracle but it is open source and can be found here.

Jackson
  • 5,627
  • 2
  • 29
  • 48
1

RogueWave, the software company, have a nice implementation of BTreeOnDisk as part of their Tools++ product. I've been using it since late 90's. The nice thing about it is that you can have multiple trees in one file. But you do need a commercial license.

In their code they do make a reference to a book by a guy called 'Ammeraal' (see http://home.planet.nl/~ammeraal/algds.html , Ammeraal, L. (1996) Algorithms and Data Structures in C++). He seems to have an imlementation of a BTree on disk, and the source code seems to be accessible online. I have never used it though.

I currently work on projects for which I'd like to distribute the source code so I need to find an open source replacement for the Rogue Wave classes. Unfortunately I don't want to rely on GPL type licenses, otherwise a solution would be simply to use 'libdb' or equivalent. I need a BSD type license, and for a long time I couldn't find anything suitable. But I will have a look at some the links in earlier posts.