2

In terms of

  1. scalability,
  2. performance,
  3. maintenance,
  4. Ease of use / Learning curve
  5. cost,

In order of significance but wouldn't mind a general answer as I appreciate I m probably asking for too much :)

Thanks

EDIT: I m looking for a database that will serve as the single authoritative data store and I need all attributes of the documents stored to be indexed for various business reasons. Therefore I know that other solutions won't do what I m looking for.

Yannis
  • 6,047
  • 5
  • 43
  • 62
  • There are relational dbs, document, key/value, column, graph... There are many brand-name db's for each type. There are entire books written about how/when to use each, with no single right answer, just suggestions and thinking points - *definitely* not something that can go in an answer here. Costs? They're published - no need to do a comparison here. Performance is documented, and you can benchmark - no way to know how your data is modeled and how perf will be impacted. Scalability? Documented. Maintenance? Both are services. – David Makogon Oct 23 '15 at 13:39
  • I was under the impression that SO is a website that although stuff are sometimes documented, people are willing to help out. Documented stuff then? Looking at this http://stackoverflow.com/questions/10941488/what-is-the-difference-between-an-azure-web-site-and-an-azure-web-role/10941526#10941526 and http://stackoverflow.com/questions/3426360/azure-sql-database-web-vs-business-edition/3521506#3521506 and another million questions here where their answers exists in some documentation. Why do we spend time answering them then? – Yannis Oct 23 '15 at 14:41
  • If you don't want to go through the hassle of answering then that's not a problem. You can point me to links comparing these two systems in each of the aspects I mention above. If there is such an abundance of documentation it wouldn't take more than a google search would it? (I have done several by the way) – Yannis Oct 23 '15 at 14:43

1 Answers1

5

tl;dr; If you are using JavaScript and building browser apps, node.js and DocumentDB are a match made in heaven. If you are using .NET and/or other Azure services, then DocumentDB is favored. If you are using other AWS services, then SimpleDB might be better.

I know that questions like this are not ideal for Stack Overflow, but I often see value in answers like this and my most popular answer on SO is essentially informed opinion backed by evidence. I have not used SimpleDB but I looked into it before deciding on DocumentDB. I rejected it pretty quickly... although I did give AWS Lambda a serious look before deciding on DocumentDB. So:

  1. scalability. DocumentDB has a very straight forward and explicit scaling model -- add more collections if you need either more space or more operations per second. SimpleDB's scaling model is similar except less straight forward since you add domains which are overloaded to both provide type separation (think tables) and scalability. You can scale either to whatever you need.

  2. performance. Since I never built anything on it, I can't say anything about SimpleDB's performance. However, I've been very impressed with the performance of DocumentDB. Latency is less than 10ms for simple id-based reads and I get impressive latency and throughput for queries. The DocumentDB implementation of our current app returns complex n-dimensional aggregations (done in stored procedures on DocumentDB using documentdb-lumenize) in 1/4 the time of the functionally-equivalent MongoDB/node.js implementation. You'd have to do your own performance testing on your actuall application to have a definitive answer here.

  3. maintenance. Both are much more hands off than traditional data stores. There just aren't that many knobs to turn maintaining either of them. SimpleDB geographically distributes your data by default. You'd have to do the equivalent manually in DocumentDB. Possible, but harder. DocumentDB has good import/export tools and their backup solution is about to be significantly upgraded.

  4. ease of use / learning curve. If you are JavaScript programmer, than DocumentDB has a lot to recommend it. DocumentDB uses JSON natively. SimpleDB uses XML. DocumentDB has ACID-enabling stored procedures written in JavaScript. You'd need to combine SimpleDB with something else (Lambda maybe, but the XML/JavaScript mismatch would make this less than ideal) to get the equivalent. Both allow use to use SQL but DocumentDB also allows for JavaScript native queries.

    There is one huge mindset hurdle that you will have to get over in order to be successful with DocumentDB. Despite the fact that they both scale by adding more domains/collections, SimpleDB domains are closer conceptually to tables. The word choice of "collection" by the DocumentDB team is unfortunate since they are more akin to partitions and should not be thought of as tables. The hard part is getting used to the idea that you store all of your different data types in the same collection. Once you get over that, I find DocumentDB's approach refreshing and incredibly flexible. I can efficiently model inheritance and type-mixins. Collections nay partitions have one purpose -- scalability. Domains are used for both scalability and data type separation which is actually harder in practice.

  5. cost. Not much to say here. Both allow you to scale your cost gradually. For really small implementations, DocumentDB is probably more expensive since the smallest unit of usage is a single collection which is $25/month minimum. You'd have to do your own modeling/what-if analysis to determine which would be less expensive for you. Note, that Azure is being every aggressive in general and even pushing AWS to lower prices in some cases. My gut is that they would be roughly equal in cost for most applications.

Other thoughts:

  • You wrote, "I need all attributes of the documents stored to be indexed". One really nice feature of DocumentDB is that you can specify the size of your indexes By default, every field is indexed into a 3-byte per field hash index, which is highly space efficient. I do not know if SimpleDB has the equivalent.

  • This is a bit like comparing apples to oranges. I consider DocumentDB to be like MongoDB or CouchDB in it's data model and VoltDB in its use execution model (although VoltBD sprocs are written in Java). SimpleDB feels more like a simple XML object store. If you already have a big XML mindset, then it might be easier, but I think there are more folks using JSON today than XML.

  • Writing ACID-enabling stored procedures in JavaScript is a killer feature that only DocumentDB has. Some say the days of stored procedures are over; that you should put all such logic in your application server layer. If you implementing a simple CRUD API, that may be, but almost every application requires some sort of transaction where more than one row is changed at a time. This is mind bogglingly hard to do correctly without transaction support in your data store. Even if you do implement the equivalent of transactions with your NoSQL database, the overhead of the implementation eats away any development/performance/scalability advantages that you got by choosing NoSQL rather than SQL.

  • DocumentDB's user defined functions and triggers (also written in JavaScript) might be useful, although I believe the trigger implementation is crippled at this moment in time and I haven't found a use for UDFs myself yet.

  • DocumentDB has built-in attachment support. You need to integrate manually with S3 for the equivalent on AWS.

  • DocumentDB has geo indexing and operators.

  • SimpleDB's 1K per document limit is a serious limitation. This tells me that it's designed mostly for logging or as an index to S3 and not a full-fledged document store. The limit for DocumentDB is 512K.

If comparison to SimpleDB is like apples to oranges, then comparison to ElasticSearch is like apples to fire engines. My impression of ElasticSearch is that it's all about full-text searching and analytics. I don't think it's space/execution/api efficient to serve as a primary transactional store. Built on Lucene, it was not designed to have the reliability/durability to be your primary store. Further, even when hosted, it's more of an IaaS offering, wherease DocumentDB and SimpleDB are true PaaS offerings. The maintenance will be much higher with ElasticSearch.

Larry Maccherone
  • 9,393
  • 3
  • 27
  • 43
  • Thank you very very much for your answer. Would you be able to elaborate slightly on "The hard part is getting used to the idea that you store all of your different data types in one collection" ? Also, I understand exactly what you mean about DDB collections which are like partitions but you do need to have a broker written that decides which "partition" your data is in right? i.e. if user is Joe go to collection XYZ, or if user if Mary go to collection 123 ? – Yannis Oct 23 '15 at 15:48
  • Sure, in a single DocumentDB collection, you would store all of your document types (e.g. users, posts, and comments). You typically have another field to indicate the type and make that part of your query (e.g. `SELECT * FROM c WHERE c.type = "User" AND c.posts > 5`). You could partition on user as you mention. The .NET SDK for DocumentDB provides both a consistent hash resolver (what I think you mean by "broker") as well as a range resolver. They are actively working on equivalent resolvers for the node.js and Java SDKs. If you like the answer, can you please "accept" it? – Larry Maccherone Oct 23 '15 at 17:06
  • Accepted Sir - Thank you. Last question - could you point me to the info around the .NET "resolver" (I m not sure we mean the same thing - I mean the thing that you pass an id and it says you need to find that on collection XYZ. I mean you will have to split the collections given the 10gb limit right? so you will have your data in more than one – Yannis Oct 23 '15 at 17:10
  • 1
    That post explains the two included resolvers (range and hash) but it also has links to other implementations including a spillover resolver. – Larry Maccherone Oct 23 '15 at 17:50