14

I am looking for the database/mechanism to store the data where I can write the data and read the data with high performance.

This storage is used to for storing the Logging like important information across multiple systems. Since it's critical data which will be logged, read performance should be pretty fast as these data will be used to show history. Since we never do update on them/delete on them/or do any kinda joins, I am looking for right solution. Probably we might archive the data in long time but that's something ok to deal with.

I tried looking at different sources to understand different NoSql databases, experts opinion is always better :)

Must Have:
1. Fast Read without fail
2. Fast Write without fail
3. Random access Performance
4. Replication kinda feature, one goes down, immediately another should be up and working
5. Concurrent write/read data

Good to Have:
1. Search content like analysing the data for auditing with/without Indexes

Don't required:
1. Transactions are not required at all
2. Update never happens
3. Delete never happens
4. Joins are not required

Referred: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

Community
  • 1
  • 1
RaceBase
  • 18,428
  • 47
  • 141
  • 202
  • Have you considered a flat file? I once consulted to a lottery company. They had very stringent requirements. They used flat files, for fast and reliable read, write, and seek. – Mike Dunlavey Nov 13 '14 at 18:40
  • 2
    Just don't understand how so folk just "off topic" legit questions.... – Jeryl Cook Nov 18 '15 at 20:25
  • You need something like Hadoop with streaming. A SAAS solution is BigQuery though I would recommend it for experimental purpose only. – themihai Oct 19 '16 at 11:14

3 Answers3

20

Disclosure: Kevin Porter is a Senior Software Engineer at Aerospike, Inc. since May 2013. (ref)

Be sure to consider Aerospike; Aerospike dominates in the adtech space where high throughput reads and writes are a required. Aerospike is frequently touted as having "the speed of Redis with the scalability of Cassandra." For searching/querying see Aerospike's secondary index documentation.

For more information see the discussion/articles below:

  1. Aerospike vs Cassandra
  2. Aerospike vs Redis and Mongo
  3. Aerospike Benchmarks

Lastly verify the performance for yourself with the One million TPS on EC2 Instructions.

kporter
  • 2,684
  • 17
  • 26
  • 1
    thanks for the suggestion. As I mentioned in my post, Read/Write/Search operations should be fast enough. But when I go through Aerospike, it's about in-memory type against Cassandra disk type. We won't be able to offer such huge ram for that as these data will be part of analytics. – RaceBase Nov 13 '14 at 18:28
  • 1
    Actually Aerospike isn't only an in-memory database, the most widely deployed storage model is the [Hybrid storage](http://www.aerospike.com/docs/architecture/storage.html#hybrid-storage) where there is a 64 byte index entry for each record in ram and the data is stored on flash storage (SSD). – kporter Nov 13 '14 at 20:42
  • 9
    As per SO rules, you are [required](http://meta.stackexchange.com/questions/57497/limits-for-self-promotion-in-answers) to disclose your affiliation with Aerospike. Don't get me wrong, I love it and I'm sure it's the man for the job :) – Renato Massaro Jun 07 '15 at 11:10
6

Let me be the Cassandra sponsor.

Disclaimer: I don't say Cassandra is better than the others because I don't even know so deeply mongo/redis/whatever and I don't want even come into this kind of stuffs.

The reason why I suggest Cassandra is because your needs match perfectly with what Cassandra offers and your "don't required list" is a set of feature that are either not supported in Cassandra (joins for instances) or considered an anti-pattern (deletes and in some situations updates).

From your "Must Have" list, point by point

  1. Fast Read without fail: Supported. You can choose the consistency level of each read operation deciding how much important is to retrieve the most fresh information and how much important is speed

  2. Fast Write without fail: Same as point 1

  3. Random access Performance: When coming in the Cassandra world you have to consider many parameters to get a random access performance but the most important that comes into my mind is the data model -- if you create a data model that scales horizontally (give a look here) and you avoid hotspots you get what you need. If you model your DB in a good way you should have O(1) for each operation since data are structured to be queried

  4. Replication: In this Cassandra is even better than what you might think. If one node goes down nothing changes to the cluster and everything(*) keep working perfectly. Cassandra spots no single point of failure. I can tell you with older Cassandra version I've had an uptime of more than 3 years

  5. Concurrent write/read data: Cassandra uses the lww policy (last-write-wins) to handle concurrent writes on the same key. The system supports multiple read-write and with newer protocols also async operations.

There are lots of other interesting features Cassandra offers: linear horizontal scaling is the one I appreciate more but there is also the fact that you can know the instant in which every piece of data has been updated (the timestamp of lww), counters features and so on.

(*) - if you don't use Consistency Level All which, imho, should NEVER be used in such a system.

Community
  • 1
  • 1
Carlo Bertuccini
  • 19,615
  • 3
  • 28
  • 39
  • presently I am looking at Elastic Search vs Cassandra. Both are made into final list. Can I get any article/info what are the limitations of each one of them so that I can look at future architecture and decide the choice. – RaceBase Nov 13 '14 at 02:55
  • 1
    They're two different solutions possibly made to coexist rather than to compete. Cassandra is a storage system while es is a full text search engine based on lucene. Datastax enterprise is a solution similar to the one just described using solr as full text search engine and Cassandra to persist data and perform exact searches. – Carlo Bertuccini Nov 13 '14 at 06:50
  • I used cassandra in my solution, but read performance for same data (fetching data using exact key) degrades as the data size increases. Which should not have happened. – Atmesh Mishra May 09 '17 at 11:58
5

Here's a few more links on how you can span In-Memory with Disk (DRAM, SSM, and disk storage) w/ Aerospike:

http://www.aerospike.com/hybrid-memory/

http://www.aerospike.com/docs/architecture/storage.html

I think everyone is right in terms of matching the specific DB to your specific use case. For instance, Aerospike is optimal for key-value data. Other options might be better.

By way of analogy, I'll always remember how, decades ago, a sister of mine once borrowed my computer and wrote her term paper in Microsoft Excel. Line after line was a different row of a spreadsheet. It looked ugly as heck, but, uh, okay. She got the task done. She cursed and swore at how difficult it was to edit the thing. No kidding!

Choosing the right NoSQL database for the right task will either make your job a breeze, or could cause you to curse a blue streak if you decided on the wrong basic tool for the task at hand.

Of course, every vendor's going to defend their product. I think it's best the community answer the question. Here's another Stack Overflow thread answering a similar question:

Has anyone worked with Aerospike? How does it compare to MongoDB?

btw: Do you have any more specific insights for us on what type of problem you are trying to solve?

Community
  • 1
  • 1
Peter Corless
  • 404
  • 5
  • 5