1

what's a fast way to query large amounts of data (between 10.000 - 100.000, it will get bigger in the future ... maybe 1.000.000+) spread across multiple tables (20+) that involves left joins, functions (sum, max, count,etc.)?

my solution would be to make one table that contains all the data i need and have triggers that update this table whenever one of the other tables gets updated. i know that trigger aren't really recommended, but this way i take the load off the querying. or do one big update every night. i've also tried with views, but once it starts involving left joins and calculations it's way too slow and times out.

Gabriel Andrei
  • 187
  • 1
  • 14
  • Are you sure this is the best design? In my experience 100.000 records isn't really that big. Putting everything in one table denormalises the data, I wouldn't (usually) do that. – HoneyBadger Nov 26 '15 at 11:56
  • It will reach the million mark, so i have to consider that also. This has to be fast because once i queried the data, i need also need to use it. So it's like exporting a few 100.000s records and every join or function that get's added to the filtering makes it a lot slower. – Gabriel Andrei Nov 26 '15 at 12:21
  • Can you clarify "spread across multiple tables (20+) that involves left joins, functions"? Do you have to run a single query that performs joins across all of those tables? Or perhaps do you have some row partitioning across multiple tables? For example tables with historical data in them. – Vanlightly Nov 26 '15 at 18:19

2 Answers2

2

Since your question is too general, here's a general answer...

The path you're taking right now is optimizing a single query/single issue. Sure, it might solve the issue you have right now, but it's usually not very good in the long run (not to mention the cumulative cost of maintainance of such a thing).

The common path to take is to create an 'analytics' database - the real-time copy of your production database that you're going to query for all your reports. This analytics database can eventually be even a full blown DWH, but you're probably going to start with a simple real-time replication (or replicate nightly or whatever) and work from there...

As I said, the question/problem is too broad to be answered in a couple of paragraphs, these only some of the guidelines...

veljkoz
  • 8,384
  • 8
  • 55
  • 91
1

Need a bit more details, but I can already suggest this:

  1. Use "with(nolock)", this will slightly improve the speed.

Reference: Effect of NOLOCK hint in SELECT statements

  1. Use Indexing for your table fields for fetching data fast.

Reference: sql query to select millions record very fast

Community
  • 1
  • 1
Paul0PT
  • 106
  • 1
  • 18