0

I have 2 tables combined with union all operator in procedure.First table contains 20 million records and second table contain 1 million records. if i use Top Clause alone without distinct clause ,it gives the output but when i used TOP clause with Distinct clause,it returns first 800 records very fast after that query is executing without completion.is it correct method to use both(Distinct and Top) in same query?

SELECT Distinct TOP 1000
            TP.F_PRODUCT AS ID,
            TP.F_PRODUCT_NAME AS [NAME],
            TP.F_LANGUAGE AS LANGCODE,
            TP.F_FORMAT AS FMTCODE,
            TP.F_CUSTOM1 AS TN,
            TP.F_CUSTOM2 AS CP,
        FROM 
            T_PDF TP WHERE TP.F_PRODUCT <>''
    UNION ALL

    SELECT Distinct TOP 1000
            TP.F_PRODUCT AS ID,
            TP.F_PRODUCT_NAME AS [NAME],
            TP.F_LANGUAGE AS LANGCODE,
            TP.F_FORMAT AS FMTCODE,
            TP.F_CUSTOM3 AS TN,
            TP.F_CUSTOM4 AS CP,
        FROM 
            T_HTML TP WHERE TP.F_PRODUCT <>''
Ram
  • 727
  • 2
  • 16
  • 33

1 Answers1

1

There is nothing wrong with using TOP and DISTINCT, regardless of the UNION ALL construction present. If that's the data you need then that's the way to do it.

However, when you ask for a DISTINCT you need to realize that the system potentially has to go over a lot of records to make sure it fetches sufficient 'raw data' to get to the requested number of DISTINCT values; worst case it has to run over all 20 million records! MSSQL is pretty good at guesstimating how many rows it will need by making use of the statistics it has on the data at hand.

Now, potentially your statistics are 'way off' causing the system to fetch 'too little' records resulting in those 800 'quick results' you get but then taking a lot of time to fetch the next 200 (distinct values) from the table.

I'd suggest to try to do 2 things:

  • ask for an estimated plan and learn to interpret it
  • update the statistics for said tables and then try again, see if the estimated plan changed; especially the Estimated Number of rows should be interesting

Good luck, Roby

PS: keep in mind that when asking for TOP n you'll get a 'random selection' of the entire data; there is no guarantee whatsoever that you'll get the 'first' n rows from the table! To get there you'll need to explicitly specify an ORDER BY clause, potentially adding (a lot of) extra work to the execution of the query; again, the Query Plan will show this. (You can type both queries at once and ask for an estimated plan to see the differences. That said, when 1 query has a cost of 10% and the other of 90%, this does not mean that one will run 9 times faster than the other, cost is not the same as time, although there indeed is a link between both, just not a linear one)

deroby
  • 5,902
  • 2
  • 19
  • 33
  • Correct.Now my select statement in the stored procedure picks 910 records very fast ,after that it takes more time to complete the execution.Whether i have to use any HINT to complete the execution fast or any method to make it fast to pick first 1000 records? – Ram Feb 20 '19 at 16:43
  • 1
    Well, there is the `FAST` hint, but I'm not sure if this would be a good match here. cf https://stackoverflow.com/questions/1308946/should-i-use-query-hint-fast-number-rows-fastfirstrow – deroby Feb 22 '19 at 21:11