Our App / Data
We have a Python app with User
s in Transaction
s that have Commission
s, Fee
s, etc., with Contact
s that receive EmailMessage
s, and which Activity
s take place on (Document
s uploaded, Status
changes, etc.).
Our Reports
We generate spreadsheet reports for our customers that detail such info as the number of documents uploaded to transactions, sums of various types of commissions earned, fees charged, activities, etc. These reports, in some cases, provide statistics for a customer's account, for each month in a given year (each month in its own row in the spreadsheet).
Our Problem
We've reached a point with our web app where certain spreadsheet reports that we generate are taking minutes to produce (all waiting on Postgres), despite efforts to optimize queries, add indexes, and despite that we use only SSDs and have enough RAM to fit the database into memory. Essentially, we've reached a scale where some basic reports are becoming too expensive to run as simple aggregation queries against our production database.
Solutions I Am Considering
- Denormalize statistics into existing tables in Postgres
- Cache statistics in Memcached
- Reduce / simplify queries by moving some of the crunching into Python
- Run expensive reports in a queue and notify admins when they're ready
- Store statistics in separate reporting tables (star schema, etc.)
- Sharding
I already use options 1-4 above to an extent, but I would like to explore more options. Also, I'd like to stop using option 4 entirely, if possible, and I'm not too keen on implementing option 5 (vs simply using something like Redshift). Option 6 is a great option in some cases, but it's not something we're prepared to take on at the moment.
Where Should I Look?
I started actually looking into Redshift, but something that threw me for a loop this morning was reading (here) that "It is not a real-time analysis engine." Does this also mean "it is not useful for generating reports within a single web request", or is it more likely that this blog is stating that it's not useful for real-time apps (online games, etc.)?
I've also looked at Quicksight, but it appears to be more for building business dashboards for ourselves, rather than generating reports for our users.
How would you approach this problem, given the above information? Is Redshift the obvious answer and my above concerns about not good for real-time are moot? Is there some other service or tool or methodology that would make more sense for you in a situation like this?