19

Does anyone have a good suggestion as to what database I should use, to achieve replication across a variable number of targets? I have a mesh network of Raspberry Pi servers, each of which can contain a database. I want the contents of each database to be replicated across the network, but I can't guarantee what nodes are available at any point in time.

Most nosql databases (CouchDB, Cassandra for example) appear to only support defined targets in the configuration.

So (assuming nosql is the best database option); is there a nosql database that can replicate to variable number of targets?

Community
  • 1
  • 1
Geoff Munn
  • 191
  • 2
  • 1
    It would be good to have some information about the amount of data, frequency of additions updates and deletes, and the acceptable propagation latency. Also the rate at which nodes permanently join or leave the network. – cliffordheath Oct 09 '15 at 01:00

5 Answers5

4

For this scenario I would recommend the Hadoop Distributed File System (HDFS).

Features that make HDFS attractive to your scenario:

  • It is a distributed file system with variable replication factor (the default is 3 which is nearly impossible to lose data with).
  • Can scale up to thousands of different machines
  • Does not depend on high availability of individual nodes -- automatically handles node failure and replicates any data from downed nodes

As for the actual database... HBase, Mongo, or Cassandra are all good options here, pick whatever you are most comfortable with -- HDFS will take care of all of the replication for you.

Andy Guibert
  • 41,446
  • 8
  • 38
  • 61
3

According to this SO response:

https://stackoverflow.com/a/8787999/2020565

And upon cheking their website, maybe you should check Elliptics: http://www.ioremap.net/projects/elliptics/

The network does not use dedicated servers to maintain the metadata information, it supports redundant objects storage. Small to medium sized write benchmarks can be found on eblob page.

Community
  • 1
  • 1
noderman
  • 1,934
  • 1
  • 20
  • 36
3

In my experience Elasticsearch has great and easy-to-use cluster management, it supports out of the box nice features such as node autodiscovery, data replication, auto-rebalancing etc., have a look at docs. Usually it is used to replicate data from an other database to make it searchable but I don't see why it couldn't be used in this context as well.

Basically when you create a "table" (called "index" in ES) you get to decide that in how many "partitions" (called "shards") the data should be partitioned, and ad-hoc set that how many replicas of that table you want to have (this doesn't 100% match the correct terminology since an "index" can consist of multiple "types" but I think this is the best analogy).

An example project with three Pis is here.

I have read a bit about Cassandra as well and I imagine it would have similar features, for example partitions and replicas are mentioned here.

NikoNyrh
  • 3,578
  • 2
  • 18
  • 32
  • 1
    Other databases might have lower RAM and CPU requirements as Elasticsearch is optimized for 10 - 100 ms query times on millions of documents. It isn't just a simple key-value store. – NikoNyrh Oct 15 '15 at 20:55
2

I'd recommend taking a look at Hazelcast. They do pretty good in memory replication across a cluster that might change. You'd have to write a custom client to store the data into a local database of your choice if you want disk backed persistence, but Hazelcast can take care of replication across a cluster in memory and has a lot of flexibility.

sg7
  • 6,108
  • 2
  • 32
  • 40
Josh Wilson
  • 3,585
  • 7
  • 32
  • 53
  • 1
    A few years ago, we had Hazelcast running on a cluster of Raspberry Pi machines: http://i0.wp.com/venturebeat.com/wp-content/uploads/2013/09/img_20130920_113757.jpg?fit=800%2C600 – pveentjer Oct 16 '15 at 08:41
0
  1. You should consider Erlang OTP platform and Mnesia database

  2. If you prefer C language you can consider SQlite in memory database together with nanomsg framework

sg7
  • 6,108
  • 2
  • 32
  • 40
Amit Vujic
  • 1,632
  • 1
  • 24
  • 32