15

We are looking at using a NoSQL database system for a large project. Currently, we have read a bit about MongoDB and Cassandra, though we have absolutely no experience with either. We are very proficient with traditional relational databases like MySQL and Microsoft SQL, but the NoSQL (key/value store) is a new paradigm for us.

So basically, which NoSQL database do you guys recommend for our use?

We do both heavy writes and reads. Basically we have tens of thousands of devices that are reporting:

device_id (int), latitude (decimal), longitude (decimal), date/time (datetime), heading char(2), speed (int)

Every minute. So, at peak times we need to be able to process hundreds of writes a second.

Then, we also have users, that are querying this information in the form of, give me all messages from device_id 1234 for the last day, or last week. Also, users do other querying like, give me all messages from device_1234 where speed is greater than 50 and date is today.

So, our initial thoughts are that MongoDB or Cassandra are going to allow us to scale this much easier then using a traditional database.

A document or value in MongoDB or Cassandra for us, might look like:

{
   device_id: 1234,
   location: [-118.12719739973545, 33.859012351859946],
   datetime: 1282274060,
   heading: "N",
   speed: 34
}

Which system do you guys recommend? Thanks greatly.

Justin
  • 42,716
  • 77
  • 201
  • 296

5 Answers5

17

MongoDB has built-in support for geospatial indexes: http://www.mongodb.org/display/DOCS/Geospatial+Indexing

As an example to find the 10 closest devices to that location you can just do

db.devices.find({location: {$near: [-118.12719739973545, 33.859012351859946]}}).limit(10)
mstearn
  • 4,156
  • 1
  • 19
  • 18
  • A strong reason why we ditched mysql. – sdot257 Aug 20 '10 at 17:50
  • Thanks greatly for the reply, seems like the builtin MongoDB geospatial is going to be very useful. – Justin Aug 20 '10 at 21:50
  • Where is a good starting part for us to read, and any tutorials videos? Also, we want to allows users to store metadata for each update. What would be a good data structure for this? So, something like: { device_id: 1234, location: [-118.12719739973545, 33.859012351859946], datetime: 1282274060, heading: "N", speed: 34, metadata: { key: "value", key: "value", key: "value } } – Justin Aug 20 '10 at 21:55
  • 1
    Take a look at http://github.com/RedBeard0531/Mongo_Presentations/blob/master/20100521-mongony/geospatial_script.js#L55-97. It was used as part of this talk http://blip.tv/file/3681531. Note that if you want to use the new spherical distance calculations (http://www.mongodb.org/display/DOCS/Geospatial+Indexing#GeospatialIndexing-NewSphericalModel), you'll need to reverse the coordinates and put X/longitude first (as you did in your example). – mstearn Aug 27 '10 at 03:41
1

I have post on a location based app using MongoDB, just like the one you described. MongoDB, with it's strong query and index support, might make it a better choice for you. Just like Cassandra, MongoDB has partitioning and replication, for scaling read and writes. Their underlying architecture is very different.

Although you have not mentioned any location based queries, if you are interested in queries like "give me all the devices within the radius r of location l and between time t1 and t2", you will find MongoDB's geospatial query and indexing extremely useful.

Bo Persson
  • 90,663
  • 31
  • 146
  • 203
Pranab
  • 663
  • 5
  • 10
0

Go with mongodb for geo-location search. Release 2.4 improves on core geo features. Lot's of big sites use it for geolocation search.

Sam Taha
  • 161
  • 1
  • 3
  • 1
    I'm considering using it for a new project. Could you list a few of the big sites that use it for its core geo features? – johnpaulhayes Nov 19 '13 at 15:36
0

You might consider using ElasticSearch. ES keeps the JSON of the original document stored, along with all the indexed fields. JSON can be instantiated into any modern languages variables/arguments. In Java, one could even disable that, and store native Java persistence data in a field. After search retrieval, just loop through and instantiate a collection of the original object types.

Using Elastics Search gives you Trie Indexes for high speed numberic range indexes, obviously you get full text searches of every flavor, and geographic bounding box queries, all in AND or OR filtering. Date searches are also native (although Java's handing of dates sucks so I switched to BIG INT representations of timestamps to represent dates)

UNLIKE some past and maybe present NoSQL solutions, the geographic indexing and querying is PART of any query and no extra steps are required. I.E., one MongoDB solution in the recent past required a geospatial search to collect conforming document IDs, then you used those IDs inside another query and searched within those for your other criteria. In reality, that's what happens in all solutions anyways, but it's much faster and cached in ElasticSearch.

Dennis
  • 747
  • 7
  • 15
0

I have done some work with mongodb and geospatial data, but not on the scale mentioned above. The geospatial searches are very fast, much more so than mysql.

I suggest looking into mongodb's sharding, replication, and clustering functionality to deal with the volume of writes. Sharding across device identifier may be a good way to deal with the write volume. If you're interested in proximity of events then sharding across lat/lng may be more appropriate.

jack

Jack Cox
  • 3,290
  • 24
  • 25