Why is COUNT(DISTINCT fieldA) slower than COUNT(DISTINCT LTRIM(RTRIM(UPPER(fieldA))))?

Question

I am new to execution plans in SQL Server 2005, but this mystifies me.

When I run this code ...

((SELECT COUNT(DISTINCT StudentID)
 FROM vwStudentUnitSchedules AS v
 WHERE v.UnitActive = 1
   AND v.UnitOutcomeCode IS NULL
   AND v.UnitCode = su.UnitCode
   AND v.StartDate = su.StartDate
   AND v.StudentCampus = st.StudentCampus) - 1) AS ClassSize

To get class sizes, it timesout and running it generically, it takes like 30 secs

But when I run it with this slight modification ...

 ((SELECT COUNT(DISTINCT LTRIM(RTRIM(UPPER(StudentID))))
     FROM vwStudentUnitSchedules AS v
     WHERE v.UnitActive = 1
       AND v.UnitOutcomeCode IS NULL
       AND v.UnitCode = su.UnitCode
       AND v.StartDate = su.StartDate
       AND v.StudentCampus = st.StudentCampus) - 1) AS ClassSize

It runs almost instantly.

Is it because of the LTRIM() RTRIM() and UPPER() functions? Why would they make things go faster? I suppose it's because COUNT(DISTINCT is an aggregate that counts from left to right character by character? Yes StudentID is a VARCHAR(10).

Thanks

Sounds like you execute the queries in order, and the second one is cached. Do you flush any cached pages before execution? What takes time with the first query? Have you tried `set statistics io on`? Does the query plan look okay? — sisve, Oct 21 '13 at 04:46
Have you looked at the [execution plan](http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan)? — NullUserException, Oct 21 '13 at 04:55
Thanks Simon. I cleared the cache, shut down all browsers, and went into IE instead of FF, and yep.. it's fast still. Cheers — LoftyWofty, Oct 21 '13 at 05:01
Hello. If your StudentId is part of index, applying any function to this column will impact performance. This happen because the query ptimizer no longer know what index to use. If you look at the execution plan you should noticed the difference. — Gatej Alexandru, Oct 21 '13 at 05:28
Please try running the second query first. Is it still fast and the other one slow? If it works the other way round, the deviation might simply due to the fact the pages are already in memory after running the first query. — Fabian, Oct 21 '13 at 09:55
I think Gatej hit the answer spot on! For some reason the LTRIM(RTRIM(etc is telling the database, ".. look don't compare char by char, just compare the entire field using a Hash Match instead of a Nested Loop". That implies to me the database somehow has to be told that the index I am using above in the COUNT() is actually an INDEX and not a field to be compared using a sequential search. Like I said, I am not offay with exec plans, but I did a simple Graph view of the plan (before and after) and compared them visually. The main difference I see is that Hash Match compared to the Nested Loop. — LoftyWofty, Oct 21 '13 at 23:08
But on 2nd thoughts, it could be that the view is causing the dilema here. The view does not have any indexes defined. All the fields in the view are non-referential. And my apologies, StudentID is Char(10), not VarChar(10). However, the StudentID inside the view does come from a lookup table where StudentID is the primary key, and the first field in the view. — LoftyWofty, Oct 21 '13 at 23:12
Post the query plans. Maybe cardinality estimation was different and produced a different plan. Or then umber of groups was less. — usr, Oct 23 '13 at 09:34
@usr, how do you suggest I post an Exec plan here in StackOverflow? Screen dump? Graph version or line by line? Could you please assist how best to do this without clogging this thread with a massive page after page of an exec plan? Thanks — LoftyWofty, Oct 30 '13 at 02:00
You can upload the .sqlplan to some file sharing site or pastebin, or upload the graphical plan as an image (which is more convenient, although not as detailed). But I thought the issue was resolved for you? — usr, Oct 30 '13 at 09:02

score 2 · Accepted Answer · answered Oct 23 '13 at 06:55

2

cache of the query plan would certainly impact your speed on a second run.

Just a theory if that isn't the case perhaps its down to the trim. The select distinct is to match every string if these are shorter down to a trim its less characters maybe.

depending on you database engine also see if binary_checksum would make it any faster. if this works maybe my theory is right.

 ((SELECT COUNT(DISTINCT BINARY_CHECKSUM(LTRIM(RTRIM(UPPER(StudentID)))))

answered Oct 23 '13 at 06:55

Thomas Harris

434
3
14

Yes that also worked very fast as well. I changed it to SELECT COUNT(DISTINCT BINARY_CHECKSUM(StudentID)). The database seems to be doing char by char comparisons if I do not remove any excess white chars like spaces. – LoftyWofty Oct 25 '13 at 02:53

Why is COUNT(DISTINCT fieldA) slower than COUNT(DISTINCT LTRIM(RTRIM(UPPER(fieldA))))?

1 Answers1