3

I asked a question yesterday and one answer is making me think about performance.

Resuming, I have a table which represents a parenthood relationship:

PARENT | CHILD
   1   |   2
   1   |   3
   2   |   4

Both fields are numbers that represents one person.

I was needing to take the group of distinct persons of this table, not importing if child or parent. The query that came first in my mind was the most obvious one:

 SELECT DISTINCT PARENT FROM TABLE1
 UNION SELECT DISTINCT CHILD FROM TABLE1

But the one bellow seems to perform much better (in my real data at least):

 SELECT DISTINCT CASE WHEN N.n=1 THEN parent ELSE child END 
 FROM TABLE1
 CROSS APPLY(SELECT 1 UNION SELECT 2)N(n)

My questions are:

  • Is this second query really faster than the first I built always?
  • Just for curious, is there a faster way to do it?
Community
  • 1
  • 1
Nizam
  • 4,569
  • 3
  • 43
  • 60

2 Answers2

2

Firs query have high IO cost and low CPU cost than second query. second query have low IO and more CPU than first query.

I suggest that use second query, because IO have more affect in performance than CPU. if you can reduce IO of your query and increase CPU cost is better that reduce CPU cost and increase IO cost.

mehdi lotfi
  • 11,194
  • 18
  • 82
  • 128
  • Thank you for helping me there (on the other question) and helping me here. It took me a while to completely understand this query. – Nizam Jul 24 '14 at 04:13
2

Try following two queries

SELECT PARENT FROM TABLE1
UNION SELECT CHILD FROM Table1

UNION will do distinct for you. There is no need to use DISTINCT in sub query. This way, you can reduce DISTINCT SORT operator from 2 to 1. It also remove the need to MERGE JOIN two sub query.

SELECT DISTINCT Id
FROM 
(
   SELECT PARENT, CHILD
   FROM TABLE1
) AS S
UNPIVOT
(
   Id FOR AccountType IN ([Parent], [Child])
) AS UP

It also scans table once, but doesn't introduce any new constant.

Here is query cost in my machine with sample data

  • Query1 in question: 40%
  • Query2 in question: 23%
  • Query1 in my answer: 20%
  • Query2 in my answer: 17%

I'm not able to predicate query cost over large volume data. It might change. Have a try with your own data.

qxg
  • 6,955
  • 1
  • 28
  • 36