I need to build in a analytics (reporting, charting & graphing) system into my Django application. In an ideal world I could just query my Postgres DB and get the data I want but when the amount of data in the DB goes through the roof, I'll hit performance bottlenecks and other issues like index hell.
I'm wondering if you could point me in a right direction to implement this:
- Is this a good scenario to use a NoSQL DB like (CouchDB, MongoDB, Redis) and query the data from that?
- Since Postgres and Django have no OLAP/MDX support should I go along with a star-schema in a different databse and query that?
I'm looking to avoid two things:
- I don't want to query my actual DB for analytics as it might take a huge performance hit.
- I'd like to keep my analytics as up to date as possible i.e. I'd like to incrementally update my data warehouse to have a the latest data. Every time, there's a CRUD operation on my transactional DB, I'd like to update the data warehouse.
This is yet another scenario that I haven't worked with and am trying to understand the quickest and best way to accomplish.
I hope I've been verbose enough. If not, I'dd gladly explain more.
Thanks everyone
After digging around the web and using the knowledge I have, I've come to this solution:
Use the Postgres to store the relational data. On Every CRUD operation, call the analytics code to do the calculations on the data and store the data in a NoSQL DB like Redis/CouchDB.
Looking at this good comparison of the NoSQL DB's (http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis), I think Redis or CouchDB fits in just fine. Targeted for analytics.
I could store calculated Analytics in Redis/CouchDB and update them incrementally when my source data changes.
Is this a good solution?