I'm working on a Django application that consists of a scraper that scrapes thousands of store items (price, description, seller info) per day and a django-template frontend that allows the user to access the data and view various statistics.
For example: the user is able to click on 'Item A', and gets a detail view that lists various statistics about 'Item A' (Like linegraphs about price over time, a price distribution, etc)
The user is also able to click on reports of the individual 'scrapes' and get details about the number of items scraped, average price. Etc.
All of these statistics are currently calculated in the view itself.
This all works well when working locally, on a small development database with +/100 items. However, when in production this database will eventually consist of 1.000.000+ lines. Which leads me to wonder if calculating the statistics in the view wont lead to massive lag in the future. (Especially as I plan to extend the statistics with more complicated regression-analysis, and perhaps some nearest neighbour ML classification)
The advantage of the view based approach is that the graphs are always up to date. I could offcourse also schedule a CRONJOB to make the calculations every few hours (perhaps even on a different server). This would make accessing the information very fast, but would also mean that the information could be a few hours old.
I've never really worked with data of this scale before, and was wondering what the best practises are.