0

Please help us with ideas for the following case. Our startup e-commerce solution currently serves about 5,000 online shops, and the number is growing. Each shop sells several hundred unique products (let's say the mean is 300). The daily sales data is processed and recorded into a PostgreSQL database: "date", "product_id", "brand_name", "qty_sold", "price", "total_daily_sales", "geo", and some other less important variables. The processing system produces up to 50 million records monthly (quantity of shops multiplied by products and days), so the database is snowballing.

The data analytics team generates ad hoc reports using SQL queries. For example, top 10 "brand_names" by sales volumes in a given "geo" in May (or Q1'23). Given the database's daily granularity (over 500 million records already and growing), those SQL query reports overload the system and the execution time increases each month.

Obviously, our primary task is to use aggregate daily data into monthly data (SUM, AVG, etc.). In addition, we're wondering if it's already the right time to spend efforts on implementing some BI tool to replace direct database queries.

I would appreciate ideas and experiences regarding the most efficient way to aggregate daily data into monthly. We can't decide which way to go: materialised views, creating a new table for monthly data or maybe using an external BI tool for aggregation and building custom reports. To provide more context, our current database operates on AWS RDS and we deployed mainly on AWS solutions if that matters.

DSSC
  • 1
  • You probably need to employ a developer or DBA or other professional who can take the time to understand your business and produce an appropriate solution. – Richard Huxton Jul 01 '23 at 22:30
  • I voted to close since this is opinion-based. That said, off the top of my head, I recommend beefing up your RDS instance, looking into moving your reporting data out to other RDS or Aurora instances for your BI workload, or checking out AWS Athena running against Parquet files on S3. You should also heed the advice from @RichardHuxton and engage with a data specialist. – Mike Organek Jul 01 '23 at 22:44

0 Answers0