2

My company has been used Oracle for a long time but we would like to look for a NoSQL database as a replacement for faster querying and flexible schema design.

I have tried to use MongoDB which would be the most popular NoSQL database nowadays. I connected it to Spring Data to do some simple queries, which is quite easy to be set up and code simply. Since we are using Spring MVC for web development, Spring Data seems quite suitable for integration.

However, I heard that Cassandra would have better performance in write and read, especially in large scaling system. I am not sure whether it is worth to move to Cassandra and not sure how to measure the performance between MongoDB and Cassandra.

Here are some requirements for my system:

  • focusing on article fetching
  • tagging for articles for users to easily search for their favors or related articles
  • non-distributed system, but have load-balancing and fail-over
  • Java based, Spring MVC for web development
  • articles would be stored as XML
  • probably provide user-defined tables (collections) and fields (keys)

Therefore I would like to raise some questions:

  • Which Database is the most suitable for my case? You may also raise other databases apart from MongoDB and Cassandra.
  • If I use Cassandra, which framework would be suitable for integrating to Spring MVC?

Thank you so much in advanced.

Community
  • 1
  • 1
fmchan
  • 760
  • 1
  • 11
  • 29
  • 4
    "My company has been used Oracle for a long time but we would like to look for a NoSQL database as a replacement for faster querying" good luck with that. – Mitch Wheat Jan 13 '14 at 04:45
  • 1
    Mitch Wheat, I do not understand your meaning. So do u think it is not a good moving form oracle to nosql? any reasons behind your option? – fmchan Jan 13 '14 at 04:48
  • 1
    Cassandra is too complicated for a simple thing like blog. If your blog would contain billions of records - okay, you might want to consider Cassandra. Otherwise, use whatever is faster for you to develop. – Vasyl Boroviak Jan 13 '14 at 09:23
  • @fmchan NoSQL databases are no drop-in replacement for relational databases. Most NoSQL databases have a completely different philosophy at dealing with data, which affects the whole software architecture of the applications which use it. When your application keeps doing what it is doing right now and you just try to write an abstraction layer mapping SQL 1:1 to another query language, the result will likely be very unsatisfying. When you want to try a NoSQL database, try it on a greenfield project. – Philipp Jan 13 '14 at 09:59
  • Thanks @Philipp, what I am going to do is to develop a new system using NoSQL but not replace the original system and it is not a drop-in replacement. I will re-design the whole software architecture of the application and the NoSQL database. Guys please focus on the comparison between NoSQL database. Thank you. – fmchan Jan 13 '14 at 10:16
  • Thanks @Vasiliy Borovyak, in case there are billions of records, are you going to say Cassandra is my choice rather than MongoDB? in case there aren't billions of records, are you going to say oracle is still my suitable choice? Thank you. – fmchan Jan 13 '14 at 10:22
  • The best source for datamodelling for Cassandra: http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/ – le-doude Jan 14 '14 at 08:46
  • 1
    Cassandra is great if you need a system that can handle an incredible amount of data and still scale simply. I love that there is not Master so if you manage your data redundancy right you will get perfect uptime with great performances. – le-doude Jan 14 '14 at 08:48
  • 1
    All nosql distributed databases are meant to deal with incredible amount of data! – vivek mishra Jan 16 '14 at 04:09
  • @vivekmishra, thank you for your reminding. well, i didn't mention it but it is true to my system and that's why we need nosql. – fmchan Jan 21 '14 at 15:36

2 Answers2

2

I have experience using Spring and Cassandra together. But I always have written my own data access layer.

Using the ORMs out there for Cassandra will not allow you to leverage its full power, and you will, most likely, introduce bugs because your SQL background will make you expect certain behaviours that are just not what Cassandra will give you.

My advice write the code that will access Cassandra yourself and do not be afraid to denormalize A LOT. Think more about how you want to query (or find it) your data than the format in which you want to save it.

I also strongly recommend reading this amazing article: Cassandra Data Modeling Best Practices part 1 part 2

Another DB which might suit your application better is CouchDB (I like using BigCouch). It is another Document based NoSQL database and is in my opinion superior to MongoDB. It offers better solution for scaling and gives emphasis to Availability (just like Cassandra).

I'd like to point you to this question about the difference between CouchDB and MongoDB.

As far as framework goes Play framework has a lot of plugin to work with NoSQL systems, so you might give it a try. You could try playorm which is the last I experimented on.

EDIT : I forgot to mention Kundera as well as an ORM for Cassandra

Community
  • 1
  • 1
le-doude
  • 3,345
  • 2
  • 25
  • 55
  • Thank you so much, @ɭɘ ɖɵʊɒɼɖ 江戸. It is appreciated that you can mention CouchDB which I have not noticed before. After that, I have read some articles of comparison between MongoDB and CouchDB. But I am not sure whether the performance of CouchDB is superior than MongoDB because there is no chart/test result to measure it. According to most people said, CouchDB would be faster than MongoDB for non-real-time updated data. However, queries in MongoDB seems to be simpler and more user-friendly, and also the storage is smaller. Replication of both databases is not easier than Cassandra. – fmchan Jan 15 '14 at 06:21
  • Kundera is a ORM framework but as you said, ORM would not leverage the full power of Cassandra. Therefore, are you recommending to not use any ORMs? And does your data access layer process data to Cassandra without using ORM? – fmchan Jan 15 '14 at 06:29
  • And I would like to ask whether you have experience in HBase? I have read some articles of comparison between different databases. Charts show that the read and write performances of HBase are superior to Cassandra. However, scalability of Cassandra is greater than HBase. In my case, HBase seems to be a better choice by only comprising between HBase and Cassandra. – fmchan Jan 15 '14 at 06:35
  • 1
    {Kundera is a ORM framework but as you said, ORM would not leverage the full power of Cassandra. } I disagree with this. Kundera gives you switches to plugin custom cassandra properties and perform cassandra CQL functions as well. As an object mapper, it does map cassandra table/column family into an object. – vivek mishra Jan 15 '14 at 17:57
  • I do advice to not use ORMs with Cassandra. It's not about playORM or Kundera in particular it is just that by principle ORMs are too generic. Cassandra almost requires you to leverage the write speed to heavily denormalize in order to obtain optimal performances with reads. – le-doude Jan 15 '14 at 22:59
  • You can perfectly make an app work with ORMs but in the end a tailored data access layer with proper data modelling for Cassandra will give you a system that can handle a great number of concurrent requests. If your system is just for a small quantity of user and you just need to persist some objects I advice going with other solutions like S3, or some document based NoSQL solution. – le-doude Jan 15 '14 at 23:01
  • @fmchan I have no experience on HBase beside a quick "tutorial" app. – le-doude Jan 15 '14 at 23:03
  • 1
    As fas as performance is concerned, we have compared raw thrift api with Kundera thrift client(using YCSB) and delta is around 7-8%. You would never fetch huge amount of data into memory(even with raw thrift) but would always like to scroll it. With ORMs you get an edge for proper modeling of your data and making system less error prone. It takes out many burden from application. App always does more than simply Reading and writing from casssandra/mongodb/rdbms, that's where data modeling matters(e.g. ORM or object mappers). – vivek mishra Jan 16 '14 at 03:59
  • @vivekmishra, thank you. ORMs may affect performance. well, it probably happens to all databases but i buy your point that good design model and less error prone are also good points to be considered. – fmchan Jan 21 '14 at 15:41
2

Choosing between Cassandra and MongoDB depends on type of storage. MongoDB is primarily for document based storage where you get an edge by having various sql like features.

If you require columnar database with high availability and multi dc replication? go for Cassandra.

http://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB

vivek mishra
  • 1,162
  • 8
  • 16
  • Thank you. good points to emphasize the features of MongoDB and Cassandra, although i am still struggling with choosing one of them. – fmchan Jan 21 '14 at 15:46