0

I am trying to change my sub-query in to a join where it selects only one record in the sub-query. It seems to run the sub-query for each found record, taking over a minute to execute:

select afield1, afield2, (
    select top 1 b.field1
    from   anothertable as b
    where  b.aForeignKey = a.id
    order by field1
) as bfield1
from sometable as a

If I try to only select related records, it doesn't know how to bind a.id in the nested select.

select afield1, afield2, bfield1
from   sometable a left join (
    select top 1 id, bfield, aForeignKey 
    from   anothertable
    where  anothertable.aForeignKey = a.id
    order by bfield) b on
       b.aForeignKey = a.id
-- Results in the multi-part identifier "a.id" could not be bound

If I hard code values in the nested where clause, the select duration drops from 60 seconds to under five. Anyone have any suggestions on how to join the two tables while not processing every record in the inner table?

EDIT:

I ended up adding

left outer join (
    select *, row_number() over (partition by / order by) as rank) b on
    b.aforeignkey = a.id and b.rank = 1

went from ~50 seconds to 8 for 22M rows.

Zachary Scott
  • 20,968
  • 35
  • 123
  • 205

2 Answers2

2

Try this:

WITH qry AS
(
    SELECT afield1, 
           afield2, 
           b.field1 AS bfield1,
           ROW_NUMBER() OVER(PARTITION BY a.id ORDER BY field1) rn
      FROM sometable a LEFT JOIN anothertable b
        ON b.aForeignKey = a.id
)
SELECT *
  FROM qry
 WHERE rn = 1
Chandu
  • 81,493
  • 19
  • 133
  • 134
  • Wouldn't this select all rows then select which rows would be first? I would think this would be the same as `sometable a join (select * from anothertable b) b on a.id = b.aForeignKey` If these are the same, the query goes from 5 seconds to nearly a minute, but is faster than the subquery executing for each found record. +1 for a working solution. Maybe create a table valued function? – Zachary Scott Jul 18 '11 at 19:15
  • You are definitely on the right track. I converted your CTE to an in-line table valued function that accepted parameters for the primary key or null values for them if you want to select all, including a rank (row_number) like above. Now I can `select * from dbo.sometablef( /* aForeignKey */ 1) where rank = 1`. Thanks! – Zachary Scott Jul 18 '11 at 19:52
  • Only problem was the TVF parameters had to be hard coded, so I made the TVF with your rank above and then joined where rank = 1. Worked great. – Zachary Scott Jul 19 '11 at 19:31
  • Glad to know the query was helpful. – Chandu Jul 19 '11 at 19:38
0

Try this

select afield1, 
afield2, 
bfield1 
from sometable a 
left join 
(select top 1 id, bfield, aForeignKey from  anothertable where  aForeignKey in(a.id)  order by bfield) b on b.aForeignKey = a.id 
Rahul
  • 76,197
  • 13
  • 71
  • 125