2

I need a database that can store a massive amount of simple structured data and make it available relatively fast.

The use-case is quite simple, so I will explain it in a few words to make things clearer:

A "large" number of GPS devices will constantly send data to a java server for storage. The data is stored and accessed with the device id and the current date as key(s). The server offers the data to a relatively "small" number of clients, but they need the data very fast - for live tracking and some calculations. The relation between "small" and "large" is something around 1 to 100.

What kind of database would you recommend for this application?

With no real referential integrity needed and no data structure to speak of, RDBMS seems not the right thing for this - but I'm kind of lost in the NoSQL-World and a few advices from first hand experience would be nice.

Other requirements are:

  • Free to use and OSS
  • Some kind of Java API
  • Strong community
  • And of course reliable - a few invalid data entries aren't a big problem, but the database must always be in a usuable state
Cœur
  • 37,241
  • 25
  • 195
  • 267
Frank
  • 437
  • 1
  • 4
  • 8
  • Do you want distribution? If you want distribution of data, I think you are really taking a look at Cassandra (focused on availability and partition tolerance), HBase (consistency and partition tolerance), Voldemort (availability, partition tolerance), Riak (consistency, availability). They all have Java interfaces (the first 3 are written in Java as well). – wkl Oct 23 '11 at 17:49
  • Do you really need to save the data? If not, you might consider using a data stream management system. I don't know if there is any that fits your requirements, but I thought it was worth mentioning it. – hage Oct 23 '11 at 17:54
  • You have conflicting statements in your question. You start off asking about storing "simple structured data" then state there is "no data structure to speak of"... Further, it really seems that a GPS device *is* going to send structured data. Can you give a few examples of what you are trying to store and how it really needs to be queried? – NotMe Oct 23 '11 at 20:11
  • @birryee Thanks for the suggestions, distribution of the database will definetly become important later! – Frank Oct 24 '11 at 09:22
  • @hage: The data has to be stored for at least a month, but I think I will chache the data for the live tracking. – Frank Oct 24 '11 at 09:28
  • @ChrisLively: The data is not completly unstructured, the insert date and the device id of the records will be used for querying the database. Like: Give me all records for device X for the day Y. – Frank Oct 24 '11 at 09:29

1 Answers1

3

Most of the NoSQL (companies, fans), if not all, will tell you that their solution is the best. So whoever tells you which solution fits best (including me) is just making his opinion heard => there is no absolute truth in any answers to this question.

Having that out of the way. I would definitely recommend either Riak or CouchDB, and here is why:

  • And of course reliable - a few invalid data entries aren't a big problem, but the database must always be in a usable state

Reliability comes in many shapes and forms ( e.g. number of 9s uptime, reliable reads, [eventually] consistent data, etc.. ). Currently there is only one platform, that was written with reliability in mind and successfully tested for 30 years => Erlang OTP.

Riak and CouchDB are based on this platform, and rely on it for fault tolerance, uptime, reliability, etc..

  • Strong community

Both Riak and CouchDB have strong excellent communities. What I really love about Couchers and Riaksters is their passion and human approach to things and desire to help.

  • Simple Structure / Massive Writes

Riak is a simple {key,value} store. With an additional feature of secondary indices. As to massive writes, its backend data store is pluggable, you can even use Google's own LevelDB, which is incredibly fast and "massive".

  • Some kind of Java API

Riak Java Client, Couch Java Clients

  • Free to use and OSS

Both are FREE and OSS, and.. awesome!

tolitius
  • 22,149
  • 6
  • 70
  • 81
  • CouchDB is definitely on my list. I never heard about Riak, but it sounds interesting and I will look into it. Do you have any first hands experiences with the two databases? For what kind of application do you use it? – Frank Oct 24 '11 at 09:35
  • Do have an experience with these two. [This](http://stackoverflow.com/questions/7399533/is-nosql-database-good-for-online-money-transaction-management/7531504#7531504) would summarize my decision making for _when_ to use them. And here is the official usage data for both: [Who is Using Riak](http://wiki.basho.com/Who-is-Using-Riak.html), [CouchDB In The Wild](http://wiki.apache.org/couchdb/CouchDB_in_the_wild) – tolitius Oct 24 '11 at 14:04