1

if I run this query

select user from largetable where largetable.user = 1155 

(note I'm querying user just to reduce this to its simplest case)

And look at the execution plan, an index seek is planned [largetable has an index on user], and the estimated rows is the correct 29.

But if I do

select user from largetable where largetable.user = (select user from users where externalid = 100)

[with the result of the sub query being the single value 1155 just like above when i hard code it]

The query optimizer estimates 117,000 rows in the result. There are about 6,000,000 rows in largetable, 1700 rows in users. When I run the query of course I get back the correct 29 rows despite the huge estimated rows.

I have updated stats with fullscan on both tables on the relevent indexes, and when I look at the stats, they appear to be correct.

Of note, for any given user, there are no more than 3,000 rows in largetable.

So, why would the estimated execution plan show such a large number of estimated rows? Shouldn't the optimizer know, based on the stats, that it's looking for a result that has 29 corresponding rows, or a MAXIMUM of 3,000 rows even if it doesn't know the user which will be selected by the subquery? Why this huge estimate? The problem is, that this large estimate is then influencing another join in a larger query to do a scan instead of a seek. If I run the larger query with the subquery, it takes 1min 40 secs. If run it with the 1155 hard coded it takes 2 seconds. This is very unusual to me...

Thanks,

Chris

Lorenzo
  • 29,081
  • 49
  • 125
  • 222
Querylous
  • 11
  • 3

2 Answers2

1

The optimizer does the best it can, but statistics and row count estimations only go so far (as you're seeing).

I'm assuming that your more complex query can't easily be rewritten as a join without a subquery. If it can be, you should attempt that first.

Failing that, it's time for you to use your additional knowledge about the nature of your data to help out the optimizer with hints. Specifically look at the forceseek option in the index hints. Note that this can be bad if your data changes later, so be aware.

Donnie
  • 45,732
  • 10
  • 64
  • 86
  • `forceseek` is 2008 only. The OP is on 2000. – Martin Smith Dec 01 '10 at 15:40
  • Ah, didn't realize that. Migrated off 2000 a while ago so I didn't have anything to test it against. :( – Donnie Dec 01 '10 at 15:43
  • Thanks Donnie; I get the same effect when I rewrite as a join. I don't think SS 2000 has the forceseek, but, I'll check on that.. – Querylous Dec 02 '10 at 15:37
  • Oops, see someone agrees with me. I fixed this problem by just doing an addnl query to get subquery result, then dropping in to main query. But, it's just really weird to me why optimizer did this! – Querylous Dec 02 '10 at 15:39
  • @Querylous - On page 226 of "Inside Microsoft SQL Server 2005: Query Tuning and Optimization" this method is specifically mentioned if you have a copy... – Martin Smith Sep 11 '11 at 12:47
0

Did you try this?

SELECT lt.user
FROM Users u
     INNER JOIN largeTable lt
        ON u.User = lt.User
WHERE u.externalId = 100

Please see this: subqueries-vs-joins

Community
  • 1
  • 1
clyc
  • 2,420
  • 14
  • 15