1

I am trying to ignore duplicate records in CTE but I am not able to do that, It seems like a SELECT statement inside CTE does not allow to use ROWNUM() variable numrows to condition in WHERE clause as it is showing Invalid column name 'numrows' error while trying to do so.

SQL Query:

DECLARE @BatchID uniqueidentifier = NEWID();
DECLARE @ClusterID SMALLINT = 1;
DECLARE @BatchSize integer = 20000;
DECLARE @myTableVariable TABLE(EventID BIGINT,HotelID int, BatchStatus varchar(50),BatchID uniqueidentifier);

WITH PendingExtResSvcEventsData_Batch
    AS(
        SELECT TOP (@BatchSize) t.EventID, t.HotelID, t.BatchStatus, t.BatchID, ROW_NUMBER() OVER (PARTITION BY t.EventID ORDER BY t.EventID) numrows 
        FROM ExtResSvcPendingMsg t WITH (NOLOCK) 
        WHERE t.ClusterID = @ClusterID AND numrows = 1 AND NOT EXISTS      -- not allowed to use WHERE numrows = 1 here showing *Invalid Column Name*
              (select 1 from ExtResSvcPendingMsg t2 where t2.BatchStatus = 'Batched' 
                   and t2.EventID = t.EventID and t2.HotelID = t.HotelID)       
    )
    UPDATE PendingExtResSvcEventsData_Batch 
       SET BatchStatus='Batched',  
           BatchID = @BatchID
    -- WHERE numrows = 1 (not allowed to use WHERE here because of OUTPUT Clause)
    OUTPUT INSERTED.* INTO @myTableVariable

    SELECT e.ExtResSvcEventID,e.HotelID,e.ID1,e.ID2,e.ExtResSvcEventType,e.HostID,e.StatusCode,e.ChannelID,e.RequestAtTime,e.ProcessTime,e.DateBegin,e.DateEnd,
        e.StatusMsg,em.MsgBodyOut,em.MsgBodyIn,e.ChannelResID 
    FROM  ExtResSvcEvent e WITH (NOLOCK) 
        INNER JOIN @myTableVariable t ON e.ExtResSvcEventID = t.EventID     
        INNER JOIN ExtResSvcEventXML em with (nolock) on t.EventID = em.ExtResSvcEventID            
    ORDER BY e.ExtResSvcEventID 

I have also tried to use numrows in final SELECT like INNER JOIN @myTableVariable t ON e.ExtResSvcEventID = t.EventID AND t.numrows = 1 but this gives me a error i.e. The column reference "inserted.numrows" is not allowed because it refers to a base table that is not being modified in this statement.

How do I ignore the duplicate records while using SELECT in CTE?

Rahul Hendawe
  • 902
  • 1
  • 14
  • 39
  • use `distinct`? – Zohar Peled Mar 15 '18 at 15:16
  • yes DISTINCT .. – Sas Mar 15 '18 at 15:17
  • I believe a `DISTINCT` should do :-) – Jim Jones Mar 15 '18 at 15:19
  • Yes, I am currently using distinct in final SELECT but it causing performance impact on query and therefore I am trying to select the distinct records in CTE select itself. – Rahul Hendawe Mar 15 '18 at 15:19
  • Use distinct in the CTE? – Twelfth Mar 15 '18 at 15:24
  • @Twelfth: DISTINCT not allowed in CTE. – Rahul Hendawe Mar 15 '18 at 15:25
  • Well the error is because you can't use the alias. SELECT is execute after WHERE. So WHERE doesnt know the alias https://stackoverflow.com/questions/4596467/order-of-execution-of-the-sql-query – Juan Carlos Oropeza Mar 15 '18 at 15:27
  • @JuanCarlosOropeza - What Should I use instead of alias? – Rahul Hendawe Mar 15 '18 at 15:30
  • you either create a cte with the alias and the use another query over the cte or duplicate the source of the alias on the where condition. – Juan Carlos Oropeza Mar 15 '18 at 15:39
  • btw if you want help with the duplicate issue. Post sample data and expected output. I particularly won't spend much time trying to guess what your code does. – Juan Carlos Oropeza Mar 15 '18 at 15:44
  • Show us db schema, sample data, current and expected output. Please read [**How-to-Ask**](http://stackoverflow.com/help/how-to-ask) And here is a great place to [**START**](http://spaghettidba.com/2015/04/24/how-to-post-a-t-sql-question-on-a-public-forum/) to learn how improve your question quality and get better answers. [**How to create a Minimal, Complete, and Verifiable example**](http://stackoverflow.com/help/mcve) Try create a sample in http://rextester.com – Juan Carlos Oropeza Mar 15 '18 at 15:45
  • Distinct is allowed in cte, just not allowed in the recursive part of a recursive cte.... – Zohar Peled Mar 15 '18 at 15:51
  • @ZoharPeled: Yes right, Its also not allowed when we use Update over CTE and OUTPUT... `Cannot update the view or function 'PendingExtResSvcEventsData_Batch' because it contains aggregates, or a DISTINCT or GROUP BY clause, or PIVOT or UNPIVOT operator.` – Rahul Hendawe Mar 15 '18 at 15:55
  • Well, now that we wasted all this time on y, please show us x. [edit] your question to include sample data as DDL+DML and desired results. Further reading: [What is the XYProblem?](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) – Zohar Peled Mar 15 '18 at 15:59
  • You can use `numrows=1`, you just have to create a second CTE that looks like this: `SELECT * FROM PendingExtResSvcEventsData_Batch WHERE numrows=1` then have your update statement refer to the new CTE. (which is basically what @JaunCarlosOropeza said). – Alan Burstein Mar 15 '18 at 16:13

1 Answers1

1

You can't refer to the numrows column in the WHERE clause of the CTE because that column is not calculated at this point in the plan execution. You need to add a second CTE with a select statement where you can refer to the numrows column:

WITH Base AS (
    SELECT TOP (@BatchSize) t.EventID, t.HotelID, t.BatchStatus, t.BatchID, ROW_NUMBER() OVER (PARTITION BY t.EventID ORDER BY t.EventID) numrows 
    FROM ExtResSvcPendingMsg t WITH (NOLOCK) 
    WHERE t.ClusterID = @ClusterID 
            AND NOT EXISTS  (select 1 from ExtResSvcPendingMsg t2 where t2.BatchStatus = 'Batched' and t2.EventID = t.EventID and t2.HotelID = t.HotelID)      
), PendingExtResSvcEventsData_Batch AS (
    SELECT  EventID,
            HotelID,
            BatchStatus,
            BatchID
    WHERE   numrows = 1
)
UPDATE...

I can't vouch for the update statement working as you expect it but the PendingExtResSvcEventsData_Batch should now have one row per EventID.