6

Berkeley DB (JE) licensing may be a deal killer. I have a Java application going to a small set of customers but as it is a desktop application, my price cannot support individual instance licensing.

Is there a recommended Java alternative to Berkeley DB? Commercial or otherwise (good key-value store implementations can get non-trivial, I prefer to defer maintenance elsewhere). I need more than just a hash store as I'll need to iterate through subsequent key subsets and basic hash stores would O(m*n) that search and I expect the store to be ~50-60GiB on a desktop machine. Added benefit anyone that you can recommend that keeps its backing store in a single file?

leventov
  • 14,760
  • 11
  • 69
  • 98
Jé Queue
  • 10,359
  • 13
  • 53
  • 61
  • 1
    Any reason it has to be a single, 50-60GiB file? – corsiKa Feb 17 '11 at 17:50
  • I know that sounds strange, but it has to do with desktop IT management and a simple maximum disk size enforcement per node. 1-3 files is fine, but keeping a whole directory tree or some other structure becomes management overhead for laptop rebuilds, &c. – Jé Queue Feb 17 '11 at 17:53
  • 1
    Berkeley DB JE is open source. You don't have to pay anybody for a licence. http://www.oracle.com/technetwork/database/berkeleydb/downloads/jeoslicense-086837.html – JeremyP Feb 17 '11 at 17:56
  • 2
    The way I understand it is Berkeley is dual-licensed and this is a commercial product. http://www.oracle.com/technetwork/database/berkeleydb/downloads/licensing-098979.html – Jé Queue Feb 17 '11 at 18:04
  • dual licensed means, that if you are purchasing it from oracle (with support etc.) you have to pay something. – Thomas Jungblut Feb 17 '11 at 18:42
  • @Thomas Jungblut - correct this would be purchased and supported and redistributed and as such licensing becomes difficult in many-node environments. – Jé Queue Feb 17 '11 at 21:01
  • @Xepoch, please contact me at dave.segleau@oracle.com. I'm the Product Manager for Berkeley DB and I'd like to help you work through the licensing issue, if I can. – dsegleau Feb 18 '11 at 06:22
  • @dsegleau - thank you. Email sent. Cheers. – Jé Queue Feb 18 '11 at 20:23

10 Answers10

9

You should definitely try JDBM2, it does what you want:

  • Disk backed HashMaps/TreeMaps thus you can iterate through keys.
  • Apache 2 license

In addition:

  • Fast, very small footprint
  • Transactional
  • Standalone jar have only 145 KB.
  • Simple usage
  • Scales well up to 1e9 records
  • Uses Java serialization, no ORM mapping

UPDATE

The project has now evolved into MapDB http://www.mapdb.org

Andrejs
  • 26,885
  • 12
  • 107
  • 96
5

I think SQLite is exactly what you want: Free (Public Domain), Single File Database, Zero-Configuration, Small Footprint, Fast, cross-platform, etc.. Here is a list of wrappers, there is a section for Java. Take a look to sqlite4java and read more on Java + SQLite here.

Community
  • 1
  • 1
JPelletier
  • 2,049
  • 16
  • 23
3

It won't be a single file, but if you want embedded database, I suggest Java DB (a rebranded version of Apache Derby, which I used in a previous job with wonderful results).

Plus, both are completely free.

Edit: reading the other comments, another note: Java DB/Derby is 100% Java.

mdrg
  • 3,242
  • 2
  • 22
  • 44
  • My fear is this: when desktops/laptops are re-imaged/replaced (which I guess is about on average at ~10 months, oddly enough) I would likely need to write a migration to pull data from Java DB and re-insert. Can I literally just take the Java DB backing files as a copy, can I do this as the JDK versions change? – Jé Queue Feb 17 '11 at 18:02
  • @Xepoch I did this on my machine as a local backup and as a test, and it was simple copy and replace, and the DB was up and running. About JDK, I never tested it, but I suppose you won't have problems, as backwards compatibility is always kept. – mdrg Feb 17 '11 at 19:32
  • I have certainly thought about this approach. I would hope that the binary data is compatible upwardly with JavaDB. I should start another question on JavaDB in Java7. – Jé Queue Feb 17 '11 at 21:00
2

JavaDB aka Derby aka Cloudscape would be a decent choice; it's a pure Java SQL database, and it's included in the JRE, so you don't have to ship it with your code or require users to install it separately.

(It's actually not included in the JRE provided by some Linux package managers, but there it will be a separate package that is trivial to install)

However, Derby has fairly poor performance. An alternative would be H2 - again, a pure Java SQL database that stores a database in a single file, with a ~1MB jar under a redistributable license, but one that is considerably faster and lighter than Derby.

I've happily used H2 for a number of small projects. JBoss liked it enough that they bundled it in AS7. It's trivial to set up, and definitely worth a try.

Tom Anderson
  • 46,189
  • 17
  • 92
  • 133
2

Persistit is the new challenger. It's a fast, persistent and transactional Java B+Tree library.

I'm afraid that there's no guarantee that it will still be maintained. Akiban, the company supporting Persistit, was recently acquired by FoundationDB. The latter did not provide any information on the future.

https://github.com/akiban/persistit

Simon Brandhof
  • 5,137
  • 1
  • 21
  • 28
2

--- Edited after seeing the size of the file ---

50 to 60 GiB files! It seems that you would have to know that your DB engine didn't load all of that in memory at once, and was very efficient in handling / scavenging off-loaded data backing blocks.

I don't know if Cloudscape is up to the task, and I wouldn't be surprised if it wasn't.

--- original post follows ---

Cloudscape often fits the bill. It's a bit more than Berkeley DB, but it gained enough traction to be distributed even with some JDK offerings.

Edwin Buck
  • 69,361
  • 7
  • 100
  • 138
  • I've thought about this, but I need to be able to have a desktop support group be able to move the store/database during laptop changes. Otherwise, we're migrating the desktop data every time. – Jé Queue Feb 17 '11 at 17:54
  • as per the edit, that's correct, I don't need a cache necessarily as most data will be streamed from the file and hit % would be low. – Jé Queue Feb 17 '11 at 17:55
2

Consider ehcache. I show here a class for wrapping it as a java.util.Map. You can easily store Lists or other data structures as your values, avoiding the O(m*n) issue you are concerned with. ehcache is Apache 2.0 license, with an commercial enterprise version available by Terracotta. The open source version will allow you to spill your cache to disk, and if you choose not to evict cache entries it is effectively a persistent key-value store.

harschware
  • 13,006
  • 17
  • 55
  • 87
1

I just would like to point out that the storage backend of H2 can also be used as a key-value storage engine if you do not need sql / jdbc:

http://www.h2database.com/html/mvstore.html

mbelow
  • 1,093
  • 6
  • 11
0

H2 http://www.h2database.com/

It's a full-blown SQL/JDBC database, but it's lightweight and fast

Maurice Perry
  • 32,610
  • 9
  • 70
  • 97
0

Take a look at LMDBJava, Java bindings to LMDB, the fastest sorted ACID key-value store out there.

leventov
  • 14,760
  • 11
  • 69
  • 98