Need a distributed key-value lookup system

Question

I need a way to do key-value lookups across (potentially) hundreds of GB of data. Ideally something based on a distributed hashtable, that works nicely with Java. It should be fault-tolerant, and open source.

The store should be persistent, but would ideally cache data in memory to speed things up.

It should be able to support concurrent reads and writes from multiple machines (reads will be 100X more common though). Basically the purpose is to do a quick initial lookup of user metadata for a web-service.

Can anyone recommend anything?

What are you optimizing for? For example, read throughput (concurrent reads from multiple machines), fault tolerance in the face of machines becoming not available, low number of machines... Do you also need writes? — Alexander, Oct 13 '08 at 15:38
How do you want your data distributed? Should all of the data be available to/on/from every node or not? In the first case the next question is "why the distributed lookup?". — Alexander, Oct 13 '08 at 15:56

score 12 · Accepted Answer · answered Oct 29 '08 at 17:10

12

You might want to check out Hazelcast. It is distributed/partitioned, super lite, easy and free.

java.util.Map map = Hazelcast.getMap ("mymap");
map.put ("key1", "value1");

Regards,

-talip

answered Oct 29 '08 at 17:10

score 8 · Answer 2 · answered Oct 13 '08 at 15:40

8

Open Chord is an implementation of the CHORD protocol in Java. It is a distributed hash table protocol that should fit your needs perfectly.

answered Oct 13 '08 at 15:40

Nicholas Mancuso

11,599
6
45
47

score 2 · Answer 3 · answered Oct 15 '08 at 01:11

2

Depending on the use case, Terracotta may be just what you need.

answered Oct 15 '08 at 01:11

Alex Miller

69,183
25
122
167

score 1 · Answer 4 · answered Oct 13 '08 at 15:37

1

You should probably specify if it needs to be persistent or not, in memory or not, etc. You could try: http://www.danga.com/memcached/

answered Oct 13 '08 at 15:37

carson

5,751
3
24
25

Thanks, I've added a note that it needs to be persistent, which I think rules out memcached. – sanity Oct 13 '08 at 15:38
memcached was also my first thought, but "hundredths of GBs" is a bit too much for RAM – Javier Oct 13 '08 at 15:52

score 0 · Answer 5 · answered Oct 13 '08 at 15:51

0

Distributed hash tables include Tapestry, Chord, and Pastry. One of these should suit your needs.

answered Oct 13 '08 at 15:51

score 0 · Answer 6 · answered Oct 13 '08 at 15:55

OpenChord sounds promising; but i'd also consider BDB, or any other non-SQL hashtable, making it distributed can be dead-easy (if the number of storage nodes is (almost) constant, at least), just hash the key on the client to get the appropriate server.

score 0 · Answer 7 · answered Oct 14 '08 at 02:30

0

Open Source Cache Solutions in Java

Oracle Coherence (used to be Tangosol)

JCache JSR

answered Oct 14 '08 at 02:30

ykaganovich

14,736
8
59
96

Nikita Koksharov · Answer 8 · 2017-03-19T18:55:45.937

0

Try distributed Map structure from Redisson, it based on Redis server. Using Redis cluster configuration you may split data across 1000 servers.

Usage example:

Redisson redisson = Redisson.create();

ConcurrentMap<String, SomeObject> map = redisson.getMap("anyMap");
map.put("123", new SomeObject());
map.putIfAbsent("323", new SomeObject());
map.remove("123");

...

redisson.shutdown();

edited Mar 19 '17 at 18:55

answered Jan 12 '14 at 10:32

Nikita Koksharov

10,283
1
62
71

score 0 · Answer 9 · answered Oct 23 '08 at 08:32

nmdb sounds like its exactly what you need. Distributed, in memory cache, with a persistent on-disk storage. Current back-ends include qdbm, berkeley db, and (recently added after a quick email to the developer) tokyo cabinet. key/value size is limited though, but I believe that can be lifted if you don't need TICP support.

score -1 · Answer 10 · answered Oct 13 '08 at 15:37

-1

DNS has the capability to do this, I don't know how large each one of your records is (8GB of tons of small data?), but it may work.

answered Oct 13 '08 at 15:37

Ryan Stille

1,364
1
13
19

DNS assumes a hierarchical data structure, I'm afraid it won't do what I need. – sanity Oct 13 '08 at 15:39

Need a distributed key-value lookup system

10 Answers10

Linked