25

What kind of projects benefit from using a NoSQL database instead of rdbms wrapped by an ORM?

Examples:

  • Stackoverflow similiar sites?
  • Social communities?
  • forums?
jgauffin
  • 99,844
  • 45
  • 235
  • 372

2 Answers2

69

Your question is very general. NoSQL describes a collection of database techniques that are very different from each other. Roughly, there are:

  • Key-value stores (Redis, Riak)
  • Triplestores (AllegroGraph)
  • Column-family stores (Bigtable, Cassandra)
  • Document-oriented stores (CouchDB, MongoDB)
  • Graph databases (Neo4j)

A project can benefit from the use of a document database during the development phase of the project, because you won't have to design complex entity-relation diagrams or write complex join queries. I've detailed other uses of document databases in this answer.

If your application needs to handle very large amounts of data, the development phase will likely be longer when you use a specialized NoSQL solution such as Cassandra. However, when your application goes into production, it will greatly benefit from the performance and scalability of Cassandra.

Very generally speaking, if an application has the following requirements:

  • scale horizontally
  • work with data model X
  • perform Y operations

the application will benefit from using a NoSQL solution that is geared towards storing data model X and perform Y operations on the data. If you need more specific answers regarding a certain type of NoSQL database, you'll need to update your question.

  1. Benefits during development (e.g. easier to use than SQL, no licensing costs)?
  2. Benefits in terms of performance (e.g. runs like hell with a million concurrent users)?
  3. What type of NoSQL database?

Update

Key-value stores can only be queried by key in most cases. They're useful to store simple data, such as user sessions, simple profile data or precomputed values and output. Although it is possible to store more complex data in key-value pairs, it burdens the application with the responsibility of maintaining 'manual' indexes in order to perform more advanced queries.

Triplestores are for storing Resource Description Metadata. I don't anything about these stores, except for what Wikipedia tells me, so you'll have to do some research on that.

Column-family stores are built for storing and processing very large amounts of data. They are used by Google's search engine and Facebook's inbox search. The data is queried by MapReduce functions. Although MapReduce functions may be hard to grasp in the beginning, the concept is quite simple. Here's an analogy which (hopefully) explains the concept:

Imagine you have multiple shoe-boxes filled with receipts, and you want to calculate your total expenses. You invite some of your friends over and assign a person to each shoe-box. Each person writes down the total of each receipt in his shoe-box. This process of selecting the required data is the Map part.

When a person has written down the totals of (some of) his receipts, he can sum up these totals. This is the Reduce part and can be repeated multiple times until all receipts have been handled. In the end, all of your friends come together and sum up their total sums, giving you your total expenses. That's the final Reduce step.

The advantage of this approach is that you can have any number of shoe-boxes and you can assign any number of people to a shoe-box and still end up with the same result. Each shoe-box can be seen as a server in the database's network. Each friend can be seem as a thread on the server. With MapReduce you can have your data distributed across many servers and have each server handle part of the query, optimizing the performance of your database.

Document-oriented stores are explained in this question, so I won't discuss them here.

Graph databases are for storing networks of highly connected objects, like the users on a social network for example. These databases are optimized for graph operations, such as finding the shortest path between two nodes, or finding all nodes within three hops from the current node. Such operations are quite expensive on RDBMS systems or other NoSQL databases, but very cheap on graph databases.

Community
  • 1
  • 1
Niels van der Rest
  • 31,664
  • 16
  • 80
  • 86
  • 2
    +1 Good answer. I'll upvote you when I have more votes. – NullUserException Aug 19 '10 at 14:19
  • It's very ironic though that most large social networking site does not use Graph databases but instead uses key-value store database like Cassandra or Voldemort. – Joshua Partogi Aug 20 '10 at 09:31
  • 1
    @jpartogi: That's mainly because graph databases don't scale as well as other NoSQL solutions. This is due to the high connectivity between objects, which makes it practically impossible to store all related data on a single server for better performance. I believe Twitter still uses [FlockDB](http://github.com/twitter/flockdb#readme). It's a lightweight graph database that favors performance over complex graph operations. – Niels van der Rest Aug 20 '10 at 09:56
  • In the official website of Cassandra, they preferred to define the type of their NoSQL technique as key-value. What is the right one? http://planetcassandra.org/what-is-nosql/#nosql-database-types – Burak Karakuş Jan 31 '15 at 15:32
0

NoSQL in the sense of different design approaches, not only the query language. It can have different features. E.g. column oriented databases are used for large amount of data warehouses, which might be used for OLAP.

Similar to my question, there you'll find a lot of resources.

Community
  • 1
  • 1
bua
  • 4,761
  • 1
  • 26
  • 32