10

Is there an efficient Java implementation of a filesystem-based key-value storage with the following features:

  1. Store, overwrite, and retrieve byte arrays by a unique ID (may be assigned by the storage)
  2. No memory caching (read means read from file system, write means write to file system immediately)
  3. Total data size up to few terabytes
  4. Number of stored objects up to hundreds of millions
  5. Manageable number of file system objects (to move/copy/delete entire storage on file system level)

Will Berkeley DB JE do?

Andrey Logvinov
  • 119
  • 1
  • 6
  • why the aversion to lots of filesystem objects? This is quite easy to manage if you drop that requirement. Or heck, you can probably just use SQLITE – wowest Dec 14 '11 at 21:15
  • I certainly would not expect SQLite to scale well to "a few terabytes". – Gray Dec 14 '11 at 21:28
  • Berkeley DB JE would certainly be on my list of things to try if nothing else. I think a SQL database would be tons slower. I can't imagine doing a move/copy/delete of terabyte sized files though. – Gray Dec 14 '11 at 21:31
  • 1
    similar question here: http://stackoverflow.com/q/2654709/896405 – teejay May 01 '16 at 13:17

4 Answers4

3

Simply format a dedicated partition with a file system of your choice? The file system would meet requirements 1-4, and requirement 5 can be met by moving/copying or deleting that partition.

meriton
  • 68,356
  • 14
  • 108
  • 175
  • As soon as the data size was "terabytes" and single-keyed off an ID, I was thinking the same thing. Don't know of a Java reference offhand, but every eMail server in recent memory (exim, postfix, sendmail, qmail) works like this, I believe. – BRPocock Dec 14 '11 at 23:14
2

I suggest MapDB, MapDB provides concurrent Maps, Sets and Queues backed by disk storage or off-heap memory. Lightweight and hackable.

Zhou Tai
  • 214
  • 1
  • 10
0

This may work. Looks like your case. Suppose, worth having a look.

http://xtreemfs.blogspot.com/2008/11/babudb-efficient-key-value-store-for.html

Here is also presentation about that thing with details how it works:

http://www.xtreemfs.org/slides/BabuDB-SNAPI.pdf

Stas
  • 1,707
  • 15
  • 25
0

Perhaps HBase, however you would need to run the whole Hadoop stack, which may well be overkill! http://hbase.apache.org/

user1098798
  • 313
  • 3
  • 14