I would like to build a B+tree that spans a multi-node
computer network (internal subnet of Linux PCs) for
elastic massive storage. Range scans are important.
Is this basically the underlying data structure of
distributed DB systems? (Cassandra, HBase)
Is there any research out there on distributed B+Trees?
I saw the article at
http://www.cs.yale.edu/homes/aspnes/papers/opodis2005-b-trees-final.pdf
but skip BTrees just take faulty nodes out (so there's data loss)
I'm particularly interested in B+Trees with built-in redundancy
(i.e. if a host fails and all the nodes it hosts are offline,
I'd like another replicated host to become the primary node
server and take the place of the failed host)
I don't want to use a collection of DB instances
(1 node, one DB) as sharding is not a good choice
for a massively scaled storage system (across commodity
x86,x64 hardware with FOSS OS).
Am I reinventing the wheel?
Should I just use Cassandra or HBase?