0

I've spent some time searching for answers on this to no avail... Apologies if it has been asked before.

Basically I have a query that aggregates data on several subqueries/tiers and then cross queries against a database on a joined server. In its simplest form (can't really disclose the code) it looks like this:

Select g.*, c.* from
  ( --query that only aggregates the next subquery and does a few case/when replacements
    select *, genericCaseStatementHere, sum(foo*moo) as value
    from
      ( --this is where I do most of the heavy lifting, joining several tables and there is grouping etc 
        Select * from table1 a
        inner join table2 b on a.id = b.id
        inner join table3 c on a.id = c.id
      ) f
  ) g
inner join (select * from OtherServer.database.dbo.table4 where blah = bleh) c
  on c.something = g.something

The problem: Separating the queries into two and not joining ( referring to g and c) gives the expected data back in under 10 seconds for each segment. g is around 3500 rows and c is maybe 60 rows (each only a few columns). HOWEVER when I join them as seen above the query runs for over 20 mins (I killed it at this point, there was no result).

Thinking the cross-server query was the cause of the problem I pumped table c into a temp table and then tried joining again like thus:

select g.*, c.* from 
     ( this whole chunk is the same so going to skip it) g
inner join #temptable c on c.something = g.something

Again this doesnt work. But if I feed both into temp tables and join it works in seconds:

  select g.* into #tempg from 
     ( this whole chunk is the same so going to skip it) g

  select g.* from #tempg g
  inner join #temptable c on c.something = g.something

While I could feasibly use this I would rather have a working single query, or at least understand why the query is not finishing.

I have seen posts online mentioning a subquery referencing the outerquery for each row but those examples dont appear to be the same as my own. I hazard a guess that this is actually what is happening, but I dont understand how it is, or how to prevent it. Any knowledge that could be shared on this would be great!

Chris Pickford
  • 8,642
  • 5
  • 42
  • 73
AMcNall
  • 529
  • 1
  • 5
  • 23
  • It sounds like the problem is with the query plan against the linked server. I don't know why it works better with two temp tables. You'd need to post a lot more detail about queries, indexes, servers, etc. if there's any hope of finding a way around this. This kind of workaround is pretty common with linked servers in my experience. – shawnt00 Sep 01 '16 at 17:11
  • 3
    linked servers don't use the indexes as you'd expect. However, there is no magic bullet to help you if you can't share the script and schema. We're shooting in the dark. This is why using SSIS is still used. You can connect, query and combine data between servers (even between versions and Oracle) – SQLMason Sep 01 '16 at 19:06
  • Have you looked at the execution plan yet? That should reveal what step is causing the problem. – Jonathan Allen Sep 01 '16 at 19:53
  • SQL is very smart, but the method of choice may be limited by what SQL thinks is available (limited statistics, poor choice of SQL keywords, improper joins, etc). If you have SSMS, check the query execution plan or utilize SHOWPLAN, whatever is easiest. Check the estimated row count as well, etc. – clifton_h Sep 01 '16 at 23:29
  • Aka: if it helps, your temp example is plain to understand. SQL can optimize well, it uses straight comparisons...whereas your other query at best uses 4 subqueries that look overtly complicated. Any reason you cannot use a CTE? At least make SQL Server guesstimate. Try rethinking your queries. If your still stuck, I will have an answer later – clifton_h Sep 01 '16 at 23:35

0 Answers0