2

I am working with SAP Timesheet data, so there are millions of rows. What I am trying to do is select the data from the SAP table and insert it into a table on MS SQL Server.

So I want to insert the original record, then if an update to the original record happens, which is in the form of a new SAP record with a refcounter, I want to find the original record in my table and update it, keeping the original counter value.

So I have done this successfully with a cursor (I know not the best), but with millions of records, I am wondering if there is a faster way, because I am on day 4 of my cursor running. Is there a better way then what I have below:

BEGIN
    CREATE TABLE CATSDB 
        (
            [COUNTER] nvarchar(12),
            REFCOUNTER nvarchar(12),
            PERNR nvarchar(8),
            WORKDATE nvarchar(8),
            CATSHOURS decimal(7, 3),
            APDAT nvarchar(8),
            LAETM nvarchar(6),
            CATS_STATUS nvarchar(2),
            APPR_STATUS nvarchar(2)
        )   

    INSERT INTO CATSDB
            (
                [COUNTER],REFCOUNTER,PERNR,WORKDATE,CATSHOURS,APDAT,LAETM,CATS_STATUS,APPR_STATUS
            )
        VALUES
            ('000421692670',NULL,'00000071','20190114','6.00','20190204','174541','30','30'),
            ('000421692671',NULL,'00000071','20190114','3.00','20190204','174541','30','30'),
            ('000421692672',NULL,'00000071','20190115','6.00','00000000','000000','60','20'),
            ('000421692673',NULL,'00000071','20190115','3.00','00000000','000000','60','20'),
            ('000421692712','000421692672','00000071','20190115','0.00','20190115','111007','30','30'),
            ('000421692713','000421692673','00000071','20190115','0.00','20190115','111007','30','30'),
            ('000429718015',NULL,'00000072','20190313','7.00','00000000','000000','60','20'),
            ('000429718016',NULL,'00000072','20190313','1.50','20190315','164659','30','30'),
            ('000429718017',NULL,'00000072','20190313','1.00','20190315','164659','30','30'),
            ('000430154143',NULL,'00000072','20190313','2.00','00000000','000000','60','20'),
            ('000430154142','000429718015','00000072','20190313','5.00','00000000','000000','60','20'),
            ('000430154928','000430154142','00000072','20190313','4.50','20190315','164659','30','30'),
            ('000430154929','000430154143','00000072','20190313','2.50','20190315','164659','30','30'),
            ('000429774620',NULL,'00000152','20190314','1.00','00000000','000000','60','20'),
            ('000429774619',NULL,'00000152','20190314','1.00','00000000','000000','60','20'),
            ('000429802106','000429774620','00000152','20190314','2.00','00000000','000000','60','20'),
            ('000429802105','000429774619','00000152','20190314','3.00','00000000','000000','60','20'),
            ('000429840242','000429802106','00000152','20190314','4.00','20190315','143857','30','30'),
            ('000429840241','000429802105','00000152','20190314','5.00','20190315','143857','30','30')

    CREATE TABLE [TBL_COUNTER]
        (
            [COUNTER] [varchar](12) NOT NULL,
            [REFCOUNTER] [varchar](12) NULL
        )   

    CREATE TABLE TEMP
        (
            [COUNTER] [nvarchar](12) NOT NULL,
            [REFCOUNTER] [nvarchar](12) NULL,
            [PERNR] [nvarchar](8) NULL,
            [WORKDATE] [nvarchar](8) NULL,
            [CATSHOURS] [decimal](7, 3) NULL,
            [APDAT] [nvarchar](8) NULL,
            [LAETM] [nvarchar](6) NULL,
            [CATS_STATUS] [nvarchar](2) NULL,
            [APPR_STATUS] [nvarchar](2) NULL
        )       
END

BEGIN
    DECLARE     @COUNTER nvarchar(12),  
                @REFCOUNTER nvarchar(12),   
                @PERNR nvarchar(8), 
                @WORKDATE nvarchar(8),  
                @CATSHOURS decimal(7, 3),
                @APDAT nvarchar(8),
                @LAETM nvarchar(6),
                @CATS_STATUS nvarchar(2),
                @APPR_STATUS nvarchar(2)

    DECLARE @orig_counter nvarchar(12)
END

BEGIN
    DECLARE curs CURSOR FOR
        SELECT 
                [COUNTER],
                REFCOUNTER,
                PERNR,
                WORKDATE,
                CATSHOURS,
                APDAT,
                LAETM,
                CATS_STATUS,
                APPR_STATUS
        FROM 
                CATSDB
END

BEGIN
    OPEN curs
END

BEGIN
    FETCH NEXT FROM curs INTO
        @COUNTER,
        @REFCOUNTER,
        @PERNR,
        @WORKDATE,
        @CATSHOURS,
        @APDAT,
        @LAETM,
        @CATS_STATUS,
        @APPR_STATUS
END

BEGIN
    WHILE @@FETCH_STATUS = 0
        BEGIN
            BEGIN
                IF NOT EXISTS (SELECT * FROM TBL_COUNTER WHERE [COUNTER] = @COUNTER)
                    BEGIN
                        INSERT INTO TBL_COUNTER
                                ([COUNTER]
                                ,REFCOUNTER)
                            VALUES
                                (@COUNTER
                                ,@REFCOUNTER)
                    END
            END
            BEGIN
                IF NOT EXISTS (SELECT * FROM TEMP WHERE [COUNTER] = @COUNTER)
                    BEGIN
                            --If REFCOUNTER is populated, get the original COUNTER value, then update that row with the new values. Otherwise insert new record
                            IF @REFCOUNTER <> '' AND @REFCOUNTER IS NOT NULL
                                BEGIN
                                    BEGIN
                                        WITH n([COUNTER], REFCOUNTER) AS 
                                            (
                                                SELECT 
                                                        cnt.[COUNTER], 
                                                        cnt.REFCOUNTER 
                                                FROM 
                                                        TBL_COUNTER cnt
                                                WHERE 
                                                        cnt.[COUNTER] = @REFCOUNTER
                                            UNION ALL
                                                SELECT 
                                                        nplus1.[COUNTER], 
                                                        nplus1.REFCOUNTER 
                                                FROM 
                                                        TBL_COUNTER as nplus1, 
                                                        n
                                                WHERE 
                                                        n.[COUNTER] = nplus1.REFCOUNTER
                                            )
                                        SELECT @orig_counter = [COUNTER] FROM n WHERE REFCOUNTER = '' OR REFCOUNTER IS NULL
                                    END
                                    BEGIN
                                        UPDATE TEMP
                                           SET 
                                               [REFCOUNTER] = @REFCOUNTER
                                              ,[PERNR] = @PERNR 
                                              ,[WORKDATE] = @WORKDATE                                               
                                              ,[CATSHOURS] = @CATSHOURS                                                                                    
                                              ,[APDAT] = @APDAT                                        
                                              ,[LAETM] = @LAETM
                                              ,[CATS_STATUS] = @CATS_STATUS
                                              ,[APPR_STATUS] = @APPR_STATUS                                        
                                            WHERE [COUNTER] = @orig_counter
                                    END
                                END
                            ELSE
                                BEGIN
                                    INSERT INTO TEMP
                                               ([COUNTER]
                                               ,[REFCOUNTER]                                               
                                               ,[PERNR]                                               
                                               ,[WORKDATE]                                               
                                               ,[CATSHOURS]                                             
                                               ,[APDAT]                                              
                                               ,[LAETM]
                                               ,[CATS_STATUS]                                               
                                               ,[APPR_STATUS])                                              
                                         VALUES
                                               (@COUNTER
                                               ,@REFCOUNTER                                              
                                               ,@PERNR                                               
                                               ,@WORKDATE                                             
                                               ,@CATSHOURS                                               
                                               ,@APDAT                                               
                                               ,@LAETM                                               
                                               ,@CATS_STATUS                                               
                                               ,@APPR_STATUS)                                               
                                END
                    END

            FETCH NEXT FROM curs INTO
                @COUNTER,
                @REFCOUNTER,
                @PERNR,
                @WORKDATE,
                @CATSHOURS,
                @APDAT,
                @LAETM,
                @CATS_STATUS,
                @APPR_STATUS
        END
    END
END

BEGIN
    CLOSE curs
    DEALLOCATE curs
END

I shortened it and created the tables for you all to be able to see what is going on. The expected result is

+--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+
|   COUNTER    |  REFCOUNTER  |  PERNR   | WORKDATE | CATSHOURS |  APDAT   | LAETM  | CATS_STATUS | APPR_STATUS |
+--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+
| 000421692670 | NULL         | 00000071 | 20190114 |      6.00 | 20190204 | 174541 |          30 |          30 |
| 000421692671 | NULL         | 00000071 | 20190114 |      3.00 | 20190204 | 174541 |          30 |          30 |
| 000421692672 | 000421692672 | 00000071 | 20190115 |      0.00 | 20190115 | 111007 |          30 |          30 |
| 000421692673 | 000421692673 | 00000071 | 20190115 |      0.00 | 20190115 | 111007 |          30 |          30 |
| 000429718015 | 000430154142 | 00000072 | 20190313 |      4.50 | 20190315 | 164659 |          30 |          30 |
| 000429718016 | NULL         | 00000072 | 20190313 |      1.50 | 20190315 | 164659 |          30 |          30 |
| 000429718017 | NULL         | 00000072 | 20190313 |       1.0 | 20190315 | 164659 |          30 |          30 |
| 000430154143 | 000430154143 | 00000072 | 20190313 |      2.50 | 20190315 | 164659 |          30 |          30 |
| 000429774620 | 000429774620 | 00000152 | 20190314 |      2.00 | 00000000 | 000000 |          60 |          20 |
| 000429774619 | 000429802105 | 00000152 | 20190314 |      5.00 | 20190315 | 143857 |          30 |          30 |
+--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+

I need to add to this. So there is two phases to this. The first phase is I will pull all the data from 2019 for an initial load of my table. Then on a weekly basis, I will pull the data from the origin source for new records and changed records from the last time i ran it. So I will not have the full chain every week. There needs to be a way to get back to the original counter value, without the full dataset, which is why i had the counter table. I apologize for not being more clear. I am swamped with work and havent been able to focus on this as much as I planned. I am trying all these different techniques.

Sandra Rossi
  • 11,934
  • 5
  • 22
  • 48
dk96m
  • 301
  • 3
  • 18
  • 3
    A Cursor is going to be literally the *worst* way to do this. You may well want to do this in batches, but a cursor means your going to do this Row By Agonising Row (RBAR). Perhaps you could give us some sample data and expected results to go with your SQL here? – Thom A Aug 21 '19 at 13:59
  • Looking at what you have, perhaps you're after a `MERGE`? – Thom A Aug 21 '19 at 14:10
  • 1
    I would rewrite this to 2 statements. One that updates all rows from the query that already exists, and one to insert all rows from the query that you dont have yet. Both should be run separate after each other. This way you dont need a cursor which is your performance killer here – GuidoG Aug 21 '19 at 14:13
  • Let me put together some sample data and get back to you. I know this is the worst way possible to do this. I am not an expert, so this is what I had come up with, but now that it has been running for 4 days, yea, not the best. The problem is that an original record can have more than 1 change to it, so there are multiple records with multiple refcounters, which is why i have the TCATMSTR_COUNTER table to walk back to get the original counter value. – dk96m Aug 21 '19 at 15:01
  • I cant add the queries to create the tables and add the data, it is too long. How do I provide that info. – dk96m Aug 21 '19 at 16:09
  • Maybe you can limit the data by using some of the business logic from SAP? There are rules describing when the counter gets updated. If you limit to only approved records, you will not have that many changes. More info: https://wiki.scn.sap.com/wiki/display/ERPHCM/The+Database+table+for+the+Time+Sheet+-+CATSDB – Mikael G Aug 21 '19 at 19:31
  • @MikaelG you cant do that because how will you find the original record to update? I update the OP with examples. If a record is changed 5 times in SAP, the last approved record has no idea what the original record is, unless you walk back the counter and refcounter to get the original. So if a record is approved then changed multiple times, you will need to get all the records. – dk96m Aug 22 '19 at 15:00
  • @dk96m, what version of SQL Server do you use? (`SELECT @@Version`) Please simplify your sample data (the only important columns are `Counter` and `RefCounter`, so keep those plus a couple more, like `BELNR` and `CHARGE_HOLD`. Nobody is going to read through 20 columns. At the same time add more rows to the sample data. Just one set of changes is not enough to verify that query works correctly. Add a couple more sets / chains of changes. – Vladimir Baranov Aug 27 '19 at 01:29
  • @dk96m Also, please clarify: does the last row in the chain of changes contain all the necessary data? I mean, if we find the first row in the chain with `Counter=420202428` and the last row with `Counter=420331792`, and then copy all values from all columns from the last row into the first row, would it be enough? This also should become clear if you provide good simplified sample data. – Vladimir Baranov Aug 27 '19 at 01:33
  • @dk96m, also please clarify, it looks like `Counter` is the primary key, right? – Vladimir Baranov Aug 27 '19 at 01:41
  • the issue is for the initial set up? or for the next updates? – Tuckbros Aug 28 '19 at 08:06
  • @VladimirBaranov, Microsoft SQL Server 2016 (X64) for the version. I will work on trying to make it even simpler. It is already pretty simple, IMO, but i understand the columns thing. I already cut it down alot. – dk96m Aug 30 '19 at 17:44
  • @VladimirBaranov, sort of with the last and first. The changes dont all come in at the same time. I might get the initial, then the next week two changes, then another the next week, so you will need to be able to walk back from the last change to the initial record whenever data comes in. But essentially yes, copy all the data from the last row into the initial row, but keeping the initial counter value. – dk96m Aug 30 '19 at 17:47
  • @VladimirBaranov yes i would say counter and mandt are the primary keys – dk96m Aug 30 '19 at 17:48
  • @VladimirBaranov ok, i update the OP to have less columns and more data. Does that help? – dk96m Aug 30 '19 at 21:06
  • Can you reexplain the output and extend the bounty or at least reexplain the output in best possible manner.Thanks. – KumarHarsh Sep 02 '19 at 10:35
  • @KumarHarsh Ill try to extend the bounty if I can figure out how to do that. As for the result. Ok, So for the initial load, I will be pulling in all 2019 data from the origin source. I will process it. After that I will only be pulling in new records or changed records since the last time I ran it, which will be weekly. So an original record comes in, that record is added to my table. A change was made to the original record, which creates a new record with a refcounter to the previous counter. I want to update that original counter record in my table with the data from the new record – dk96m Sep 03 '19 at 14:57
  • @KumarHarsh keeping the original record counter. See the example in the OP. – dk96m Sep 03 '19 at 14:58
  • Sorry for getting late.Check my answer .I hv not understood why you will insert or delete.what will select * from CATSDB look like ? – KumarHarsh Sep 11 '19 at 12:08

6 Answers6

1

I believe, following query would help you to start with and it's much efficient way to approach you goal.

It was created to maintain historical info of SQL Servers in central location, and performs following activities, you have to include/replace your table structures in respective blocks of script

  1. Creates temp table
  2. Collects the information from multiple servers using OPENQUERY via Lined Servers (source) and loads into Temp Table.
  3. Creates Indexes on Temp tables
  4. Loads the data into Central Table (destination) with 3 scenarios (as commented in script)

Note: Replaced the script as per your scenario


BEGIN
    Create Table #SrcTemp
                    (   AENAM nvarchar(12),
                        AUTYP nvarchar(2),
                        AWART nvarchar(4),
                        BELNR nvarchar(10),
                        CATSHOURS decimal(7, 3),
                        CATSQUANTITY decimal(18, 3),
                        CHARGE_HOLD nvarchar(24),
                        [COUNTER] nvarchar(12),
                        ERNAM nvarchar(12),
                        ERSDA nvarchar(8),
                        ERSTM nvarchar(6),
                        HRCOSTASG nvarchar(1),
                        LAEDA nvarchar(8),
                        LSTAR nvarchar(6),
                        LTXA1 nvarchar(40),
                        MANDT nvarchar(3),
                        PERNR nvarchar(8),
                        RAPLZL nvarchar(8),
                        RAUFPL nvarchar(10),
                        REFCOUNTER nvarchar(12),
                        RNPLNR nvarchar(12),
                        SKOSTL nvarchar(10),
                        CATS_STATUS nvarchar(2),
                        SUPP3 nvarchar(10),
                        WORKDATE nvarchar(8),
                        ZZOH_ORDER nvarchar(24),
                        APDAT nvarchar(8),
                        APNAM nvarchar(12),
                        LAETM nvarchar(6),
                        APPR_STATUS nvarchar(2)
                    );

--    DECLARE @orig_counter nvarchar(12)
END
    UPDATE #SrcTemp SET REFCOUNTER = '0' WHERE  REFCOUNTER = '' or REFCOUNTER is null;
    CREATE Clustered Index CLU_SrvTemp on #SrcTemp ([COUNTER], REFCOUNTER);

BEGIN

        INSERT INTO #SrcTemp
        SELECT 
                AENAM,AUTYP,AWART,BELNR,CATSHOURS,CATSQUANTITY,CHARGE_HOLD,[COUNTER],ERNAM,ERSDA,ERSTM,HRCOSTASG,LAEDA,LSTAR,LTXA1,MANDT,
                PERNR,RAPLZL,RAUFPL,REFCOUNTER,RNPLNR,SKOSTL,CATS_STATUS,SUPP3,WORKDATE,ZZOH_ORDER,APDAT,APNAM,LAETM,APPR_STATUS
        FROM    
                CATSDB;
END

--BEGIN
--    OPEN curs
--END

-- Scope: UNCHANGED Records ==================================================================================================================================

    IF EXISTS 
        (select *
         from   (
                    SELECT ROW_NUMBER () OVER (PARTITION BY [COUNTER] ORDER BY COUNTER) AS RN
                    FROM    #SrcTemp
                    WHERE REFCOUNTER = '0'
                ) as t where t.RN > 1
         )
        BEGIN
            RAISERROR ('Primary key violation occurred in "UNCHANGED" records processing block', 16, 1) with NOWAIT;
        END
    ELSE 

    BEGIN
    -- When NON-CHANGED Records NOT Existed in SQL table -------------------------------------------
            BEGIN
                INSERT INTO TEMP  ([AENAM],[AUTYP],[AWART],[BELNR],[CATSHOURS],[CATSQUANTITY],[CHARGE_HOLD],[COUNTER],[ERNAM]
                                    ,[ERSDA],[ERSTM],[HRCOSTASG],[LAEDA],[LSTAR],[LTXA1],[MANDT],[PERNR],[RAPLZL],[RAUFPL]
                                    ,[REFCOUNTER],[RNPLNR],[SKOSTL],[CATS_STATUS],[SUPP3],[WORKDATE],[ZZOH_ORDER],[APDAT],[APNAM]
                                    ,[LAETM],[APPR_STATUS]
                                    )
                SELECT    s.[AENAM], s.[AUTYP], s.[AWART], s.[BELNR], s.[CATSHOURS], s.[CATSQUANTITY], s.[CHARGE_HOLD], s.[COUNTER], s.[ERNAM]
                        , s.[ERSDA], s.[ERSTM], s.[HRCOSTASG], s.[LAEDA], s.[LSTAR], s.[LTXA1], s.[MANDT], s.[PERNR], s.[RAPLZL], s.[RAUFPL]
                        , s.[REFCOUNTER], s.[RNPLNR], s.[SKOSTL], s.[CATS_STATUS], s.[SUPP3], s.[WORKDATE], s.[ZZOH_ORDER], s.[APDAT], s.[APNAM]
                        , s.[LAETM], s.[APPR_STATUS]
                FROM    #SrcTemp as S
                        LEFT JOIN
                            TEMP as D on s.COUNTER = d.COUNTER
                WHERE (S.REFCOUNTER = '0') and D.COUNTER is null ;
            END

        -- When NON-CHANGED Records Existed in SQL table -------------------------------------------
            BEGIN
                UPDATE S
                    SET [AENAM] = D.AENAM
                        ,[AUTYP] = D.AUTYP
                        ,[AWART] = D.AWART
                        ,[BELNR] = D.BELNR
                        ,[CATSHOURS] = D.CATSHOURS
                        ,[CATSQUANTITY] = D.CATSQUANTITY
                        ,[CHARGE_HOLD] = D.CHARGE_HOLD
                        ,[ERNAM] = D.ERNAM
                        ,[ERSDA] = D.ERSDA
                        ,[ERSTM] = D.ERSTM
                        ,[HRCOSTASG] = D.HRCOSTASG
                        ,[LAEDA] = D.LAEDA
                        ,[LSTAR] = D.LSTAR
                        ,[LTXA1] = D.LTXA1
                        ,[MANDT] = D.MANDT
                        ,[PERNR] = D.PERNR
                        ,[RAPLZL] = D.RAPLZL
                        ,[RAUFPL] = D.RAUFPL
                        ,[REFCOUNTER] = D.REFCOUNTER
                        ,[RNPLNR] = D.RNPLNR
                        ,[SKOSTL] = D.SKOSTL
                        ,[CATS_STATUS] = D.CATS_STATUS
                        ,[SUPP3] = D.SUPP3
                        ,[WORKDATE] = D.WORKDATE
                        ,[ZZOH_ORDER] = D.ZZOH_ORDER
                        ,[APDAT] = D.APDAT
                        ,[APNAM] = D.APNAM
                        ,[LAETM] = D.LAETM
                        ,[APPR_STATUS] = D.APPR_STATUS
                FROM    #SrcTemp as S
                        LEFT JOIN
                            TEMP as D on    (s.COUNTER = d.COUNTER and S.REFCOUNTER = D.REFCOUNTER)
                WHERE (S.REFCOUNTER = '0') and D.COUNTER is NOT null 
            END
    END

-- Scope: CHANGED Records ==================================================================================================================================

    IF EXISTS 
        (select *
         from   (
                    SELECT ROW_NUMBER () OVER (PARTITION BY [COUNTER], REFCOUNTER ORDER BY [COUNTER]) AS RN
                    FROM    #SrcTemp
                    WHERE not REFCOUNTER = '0' 
                ) as t where t.RN > 1
         )
        BEGIN
            RAISERROR ('Primary key violation occurred in "CHANGED" records processing block', 10, 1) with NOWAIT;
        END
    ELSE

    BEGIN
        -- When CHANGED Records NOT Existed in SQL table -------------------------------------------
            BEGIN
                INSERT INTO TEMP  ([AENAM],[AUTYP],[AWART],[BELNR],[CATSHOURS],[CATSQUANTITY],[CHARGE_HOLD],[COUNTER],[ERNAM]
                                    ,[ERSDA],[ERSTM],[HRCOSTASG],[LAEDA],[LSTAR],[LTXA1],[MANDT],[PERNR],[RAPLZL],[RAUFPL]
                                    ,[REFCOUNTER],[RNPLNR],[SKOSTL],[CATS_STATUS],[SUPP3],[WORKDATE],[ZZOH_ORDER],[APDAT],[APNAM]
                                    ,[LAETM],[APPR_STATUS]
                                    )
                SELECT    s.[AENAM], s.[AUTYP], s.[AWART], s.[BELNR], s.[CATSHOURS], s.[CATSQUANTITY], s.[CHARGE_HOLD], s.[COUNTER], s.[ERNAM]
                        , s.[ERSDA], s.[ERSTM], s.[HRCOSTASG], s.[LAEDA], s.[LSTAR], s.[LTXA1], s.[MANDT], s.[PERNR], s.[RAPLZL], s.[RAUFPL]
                        , s.[REFCOUNTER], s.[RNPLNR], s.[SKOSTL], s.[CATS_STATUS], s.[SUPP3], s.[WORKDATE], s.[ZZOH_ORDER], s.[APDAT], s.[APNAM]
                        , s.[LAETM], s.[APPR_STATUS]
                FROM    #SrcTemp as S
                        LEFT JOIN
                            TEMP as D on s.COUNTER = d.COUNTER and S.REFCOUNTER = D.REFCOUNTER
                WHERE (not S.REFCOUNTER = '0') and D.COUNTER is null 
            END

        -- When NON-CHANGED Records Existed in SQL table -------------------------------------------
            BEGIN
                UPDATE S
                    SET [AENAM] = D.AENAM
                        ,[AUTYP] = D.AUTYP
                        ,[AWART] = D.AWART
                        ,[BELNR] = D.BELNR
                        ,[CATSHOURS] = D.CATSHOURS
                        ,[CATSQUANTITY] = D.CATSQUANTITY
                        ,[CHARGE_HOLD] = D.CHARGE_HOLD
                        ,[ERNAM] = D.ERNAM
                        ,[ERSDA] = D.ERSDA
                        ,[ERSTM] = D.ERSTM
                        ,[HRCOSTASG] = D.HRCOSTASG
                        ,[LAEDA] = D.LAEDA
                        ,[LSTAR] = D.LSTAR
                        ,[LTXA1] = D.LTXA1
                        ,[MANDT] = D.MANDT
                        ,[PERNR] = D.PERNR
                        ,[RAPLZL] = D.RAPLZL
                        ,[RAUFPL] = D.RAUFPL
                        ,[REFCOUNTER] = D.REFCOUNTER
                        ,[RNPLNR] = D.RNPLNR
                        ,[SKOSTL] = D.SKOSTL
                        ,[CATS_STATUS] = D.CATS_STATUS
                        ,[SUPP3] = D.SUPP3
                        ,[WORKDATE] = D.WORKDATE
                        ,[ZZOH_ORDER] = D.ZZOH_ORDER
                        ,[APDAT] = D.APDAT
                        ,[APNAM] = D.APNAM
                        ,[LAETM] = D.LAETM
                        ,[APPR_STATUS] = D.APPR_STATUS
                FROM    #SrcTemp as S
                        LEFT JOIN
                        TEMP as D on s.COUNTER = d.COUNTER and S.REFCOUNTER = D.REFCOUNTER
                WHERE (not S.REFCOUNTER = '0' ) and D.COUNTER is NOT null 
            END
    END

Drop table #SrcTemp;
Shekar Kola
  • 1,287
  • 9
  • 15
  • I am not sure how to translate this into my needs. Ill keep looking at it. – dk96m Aug 21 '19 at 16:39
  • see the OP for how it works with the counter and refcounter. I could be updating the original record many times, BUT the refcounter only refers to the counter from the previous change (or original record if it is the first change) – dk96m Aug 21 '19 at 17:23
  • As i understand, the table `TCATMSTR_COUNTER` looks like temp and updated during script execution. it would help us to understand scenario better, if you could provide some sample data of Source (SAP) and Destination table layout (including only necessary columns) – Shekar Kola Aug 21 '19 at 17:59
  • I have updated the original post to shorten it, create tables, insert data and then run the cursor. How can I improve this per your suggestion. – dk96m Aug 22 '19 at 14:56
  • Can you help me? It is still running, day 9 now. – dk96m Aug 26 '19 at 14:11
  • which you can validate using `select ROW_NUMBER () OVER (PARTITION BY COUNTER, REFCOUNTER ORDER BY COUNTER), * from CATSDB` – Shekar Kola Aug 26 '19 at 15:50
  • Counter is the key column, the ref counter, when populated, references an earlier row. So the references row is no longer the latest version, but is replaced by this new row. So it is like this. Each record in the table represents one entry in a persons timesheet for a specific task. So say today I enter 8 hours for task A. But tomorrow i go in and change it to 9 hours. What sap does is set the status of the original record to cancelled and creates a new record with the new information, being 9 hours instead of 8. – dk96m Aug 26 '19 at 18:10
  • For this new record, it has a new counter value and the recounter has the value of the counter from the original record. So it just doesnt update the original record from 8 hours to 9 hours. It creates a completely new record with a reference to the old one. What I want to do in my table is to keep the original counter value, but update everything else when new records come in that reference an older record. See my example. You should be able to run it and see what it is doing. – dk96m Aug 26 '19 at 18:12
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/198497/discussion-between-shekar-kola-and-dk96m). – Shekar Kola Aug 26 '19 at 18:13
  • This didnt work. I tried it and it still has the original record plus the update records. The update records should update the original record, but keeping the original counter. – dk96m Sep 19 '19 at 15:43
  • The chat closed? – dk96m Sep 23 '19 at 11:43
0

It looks like it can be done with a simple recursive query. Having suitable index is also important.

Sample data

This is how your sample data should look like in the question. Only few relevant columns. It would be better to include several sets/chains of changes, not just one. Having only this sample data would make it harder for you to verify if presented solutions are correct.

+-----------+---------------------+-----------+------------+
|   BELNR   |     CHARGE_HOLD     |  COUNTER  | REFCOUNTER |
+-----------+---------------------+-----------+------------+
| 417548605 | T4-GS023ABC2 0150#* | 420202428 | NULL       |
| 417549506 | T4-GS023-ABC2       | 420203329 | 420202428  |
| 417553156 | JGS023001    0010#* | 420206979 | 420203329  |
| 417557221 | T4-GS023-ABC2       | 420211044 | 420206979  |
| 417581675 | JGS023001    0010#* | 420235498 | 420211044  |
| 417677969 | JGS023001    0010#* | 420331792 | 420235498  |
+-----------+---------------------+-----------+------------+

The main recursive part of the query

WITH
CTE
AS
(
    SELECT
        1 AS Lvl,
        CATSDB.BELNR AS OriginalBELNR,
        CATSDB.CHARGE_HOLD AS OriginalCHARGE_HOLD,
        CATSDB.[COUNTER] AS OriginalCOUNTER,
        CATSDB.REFCOUNTER AS OrginalREFCOUNTER,
        CATSDB.BELNR AS NewBELNR,
        CATSDB.CHARGE_HOLD AS NewCHARGE_HOLD,
        CATSDB.[COUNTER] AS NewCOUNTER,
        CATSDB.REFCOUNTER AS NewREFCOUNTER
    FROM
        CATSDB
    WHERE
        REFCOUNTER IS NULL

    UNION ALL

    SELECT
        CTE.Lvl + 1 AS Lvl,
        CTE.OriginalBELNR,
        CTE.OriginalCHARGE_HOLD,
        CTE.OriginalCOUNTER,
        CTE.OrginalREFCOUNTER,
        CATSDB.BELNR AS NewBELNR,
        CATSDB.CHARGE_HOLD AS NewCHARGE_HOLD,
        CATSDB.[COUNTER] AS NewCOUNTER,
        CATSDB.REFCOUNTER AS NewREFCOUNTER
    FROM
        CATSDB
        INNER JOIN CTE ON CATSDB.REFCOUNTER = CTE.NewCOUNTER
)
SELECT * FROM CTE;

Intermediate result

+-----+---------------+---------------------+-----------------+-------------------+-----------+---------------------+------------+---------------+
| Lvl | OriginalBELNR | OriginalCHARGE_HOLD | OriginalCOUNTER | OrginalREFCOUNTER | NewBELNR  |   NewCHARGE_HOLD    | NewCOUNTER | NewREFCOUNTER |
+-----+---------------+---------------------+-----------------+-------------------+-----------+---------------------+------------+---------------+
|   1 |     417548605 | T4-GS023ABC2 0150#* |       420202428 | NULL              | 417548605 | T4-GS023ABC2 0150#* |  420202428 | NULL          |
|   2 |     417548605 | T4-GS023ABC2 0150#* |       420202428 | NULL              | 417549506 | T4-GS023-ABC2       |  420203329 | 420202428     |
|   3 |     417548605 | T4-GS023ABC2 0150#* |       420202428 | NULL              | 417553156 | JGS023001    0010#* |  420206979 | 420203329     |
|   4 |     417548605 | T4-GS023ABC2 0150#* |       420202428 | NULL              | 417557221 | T4-GS023-ABC2       |  420211044 | 420206979     |
|   5 |     417548605 | T4-GS023ABC2 0150#* |       420202428 | NULL              | 417581675 | JGS023001    0010#* |  420235498 | 420211044     |
|   6 |     417548605 | T4-GS023ABC2 0150#* |       420202428 | NULL              | 417677969 | JGS023001    0010#* |  420331792 | 420235498     |
+-----+---------------+---------------------+-----------------+-------------------+-----------+---------------------+------------+---------------+

You can see that we've taken the starting row of the chain (where RefCounter is NULL) and carried it over the whole chain of changes.

Now we just need to pick the rows with the last change, i.e. with the largest Lvl for each starting row. One way to do it is to use ROW_NUMBER function with suitable partitioning.

Final query

WITH
CTE
AS
(
    SELECT
        1 AS Lvl,
        CATSDB.BELNR AS OriginalBELNR,
        CATSDB.CHARGE_HOLD AS OriginalCHARGE_HOLD,
        CATSDB.[COUNTER] AS OriginalCOUNTER,
        CATSDB.REFCOUNTER AS OrginalREFCOUNTER,
        CATSDB.BELNR AS NewBELNR,
        CATSDB.CHARGE_HOLD AS NewCHARGE_HOLD,
        CATSDB.[COUNTER] AS NewCOUNTER,
        CATSDB.REFCOUNTER AS NewREFCOUNTER
    FROM
        CATSDB
    WHERE
        REFCOUNTER IS NULL

    UNION ALL

    SELECT
        CTE.Lvl + 1 AS Lvl,
        CTE.OriginalBELNR,
        CTE.OriginalCHARGE_HOLD,
        CTE.OriginalCOUNTER,
        CTE.OrginalREFCOUNTER,
        CATSDB.BELNR AS NewBELNR,
        CATSDB.CHARGE_HOLD AS NewCHARGE_HOLD,
        CATSDB.[COUNTER] AS NewCOUNTER,
        CATSDB.REFCOUNTER AS NewREFCOUNTER
    FROM
        CATSDB
        INNER JOIN CTE ON CATSDB.REFCOUNTER = CTE.NewCOUNTER
)
,CTE_rn
AS
(
    SELECT
        *
        ,ROW_NUMBER() OVER (PARTITION BY OriginalCOUNTER ORDER BY Lvl DESC) AS rn
    FROM CTE
)
SELECT *
FROM CTE_rn
WHERE rn = 1
--OPTION (MAXRECURSION 0)
;

If you can have a chain longer than 100 you should add OPTION (MAXRECURSION 0) to the query, because by default SQL Server limits recursion depth to 100.

Result

+-----+---------------+---------------------+-----------------+-------------------+-----------+---------------------+------------+---------------+----+
| Lvl | OriginalBELNR | OriginalCHARGE_HOLD | OriginalCOUNTER | OrginalREFCOUNTER | NewBELNR  |   NewCHARGE_HOLD    | NewCOUNTER | NewREFCOUNTER | rn |
+-----+---------------+---------------------+-----------------+-------------------+-----------+---------------------+------------+---------------+----+
|   6 |     417548605 | T4-GS023ABC2 0150#* |       420202428 | NULL              | 417677969 | JGS023001    0010#* |  420331792 |     420235498 |  1 |
+-----+---------------+---------------------+-----------------+-------------------+-----------+---------------------+------------+---------------+----+

Efficiency

To make it work efficiently we need to have an index on REFCOUNTER column. Also, the query assumes that REFCOUNTER is NULL, not ''. If you have a mix of NULLs and empty strings, unify your data, otherwise an index would not be useful. This index is the minimum what you need to have.

Ideally, you should have a CLUSTERED index on REFCOUNTER column, because the query always selects all columns from the table.

CREATE CLUSTERED INDEX [IX_RefCounter] ON [dbo].[CATSDB]
(
    [REFCOUNTER] ASC
)

If you can't change the indexes of your original table, I would recommend to copy all millions of rows into a temp table and create this clustered index for that temp table.

I got a pretty good plan with this clustered index.

plan

Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
  • Let me study this. Sorry for not getting back to your questions on the OP. I got pulled away on something else this week and havent had a chance to come back to this. I am also going to simplify my OP and add more data. Stand by. Might not be right away – dk96m Aug 30 '19 at 17:50
  • OK, so this would work for my initial load of my table from the cats table. BUT, after that, I will only be select data from catsdb where the last changed date is after the last time i ran it, so it won't have to look at billions of rows, but the ones that are new or changed since the last run. Does that make sense. That is why i get a refcounter table, to be able to track all of the counter/refcounter values – dk96m Aug 30 '19 at 17:55
  • Updated OP to have less columns and more data. – dk96m Aug 30 '19 at 21:06
  • So I think I am going to pull that data into a local table. Should there be a clustered index on counter and refcounter or just refcounter? – dk96m Aug 30 '19 at 21:29
  • I don't quite understand what you need. The query in this answer returns your expected results for the updated sample data as well. Except for one row with COUNTER = 000429774620. You sample data has three rows in that chain, but it looks like your sample result is calculated as if there were only two. Most likely it will take less than 9 days to process your full data set with this query, but you'll have to try yourself and compare performance. I think that ideal index would be as I wrote in the answer. – Vladimir Baranov Aug 31 '19 at 02:37
  • If you need a method to efficiently update your result set periodically, that is a significantly different problem. You'd better ask another question and show good sets of sample data - the initial, plus what result you need from it; then a batch of added rows, plus how your expected result changes. If the query that always processes the whole dataset is fast enough, you may not need to make it more complicated. – Vladimir Baranov Aug 31 '19 at 02:41
  • Yes, there is my initial data load. Then going forward after that I am only going to pull data from the data source for the rows that have changed since the last time i ran it. – dk96m Sep 03 '19 at 14:29
0

Few things you can do to improve performance:

Convert COUNTER and REFCOUNTER to datatype int from nvarchar, operations on int are much faster than characters. Do not use a cursors, you can still process one record at at time using a while loop.

DECLARE @CCOUNTER int = 0
WHILE (1 = 1)
BEGIN
    /* SELECT @COUNTER = MIN(COUNTER) > @COUNTER FROM CATSDB */
    /* IF @@ROWCOUNT != 1 THEN BREAK OUT OF THE WHILE LOOP, WE ARE DONE */
    /* SELECT RECORD FOR THIS @COUNTER FROM CATSDB */
    /* DO THE PROCESSING FOR THIS RECORD */
END
Neeraj Agarwal
  • 1,059
  • 6
  • 5
  • Do I still declare all the column variables and in the while loop select into those variables to process them? – dk96m Sep 03 '19 at 15:48
0

There is a method called sql Bulk copy i don't it will help in your problem but give it a try.

amaldec23
  • 245
  • 2
  • 11
0

The most performant way to do this is through BCP. https://learn.microsoft.com/en-us/sql/tools/bcp-utility?view=sql-server-2017.

You can BCP all of the data into a staging table in SQL Server and then run your inserts and updates. Also when checking for non-existence of a record to determine whether this is an insert or an update "IF NOT EXISTS (SELECT * FROM TEMP WHERE [COUNTER] = @COUNTER)" is very expensive.

Example of a more performant way to do this: (Table names TBL_SOURCE, TBL_DESTINATION, #TBL_UPDATES, and #TBL_INSERTS)

SELECT * into #TBL_INSERTS
FROM TBL_SOURCE S
    left outer join TBL_DESTINATION D on S.COUNTER=D.COUNTER
WHERE D.Counter is null

SELECT * into #TBL_UPDATES
FROM TBL_SOURCE S
    left outer join TBL_DESTINATION D on S.COUNTER=D.COUNTER
WHERE D.Counter is not null

Updates will be captured in #tbl_updates and inserts in #tbl_inserts

0

See based few sample data and given output, our script cannot be 100% OK and optimized ,where millions of data to updated is concern.

I have confidence in my script that it can be improve in that direction,after fully understanding the requirement .

First of all I wonder why data type are nvarchar,if possible make it to varchar,int,datetime .

If you can change data type then it will do wonder to the performance.

Also there is no identity column which should be Clustered Index.

This two point matter from performance point of view.

So in my example,

CREATE TABLE CATSDB 
        (
            id int identity ,
            [COUNTER] nvarchar(12),
            REFCOUNTER nvarchar(12),
            PERNR nvarchar(8),
            WORKDATE nvarchar(8),
            CATSHOURS decimal(7, 3),
            APDAT nvarchar(8),
            LAETM nvarchar(6),
            CATS_STATUS nvarchar(2),
            APPR_STATUS nvarchar(2)
        )   

ALTER TABLE CATSDB
ADD CONSTRAINT PK_CATSDB_ID PRIMARY KEY CLUSTERED(ID)

CREATE NONCLUSTERED INDEX FICATSDB_REFCOUNTER ON CATSDB(REFCOUNTER,[COUNTER]);




IF OBJECT_ID('tempdb..#TEMP', 'U') IS NOT NULL
    DROP TABLE #TEMP;

CREATE TABLE #TEMP
(UpdateID      INT,
 FINDID        INT
 PRIMARY KEY,
 [COUNTER]     [NVARCHAR](12) NOT NULL,
 [REFCOUNTER]  [NVARCHAR](12) NULL,
 [PERNR]       [NVARCHAR](8) NULL,
 [WORKDATE]    [NVARCHAR](8) NULL,
 [CATSHOURS]   [DECIMAL](7, 3) NULL,
 [APDAT]       [NVARCHAR](8) NULL,
 [LAETM]       [NVARCHAR](6) NULL,
 [CATS_STATUS] [NVARCHAR](2) NULL,
 [APPR_STATUS] [NVARCHAR](2) NULL
);

WITH CTE
     AS (SELECT a.id,
                a.[COUNTER],
                a.REFCOUNTER,
                a.id AS Findid
         FROM dbo.CATSDB A

         UNION ALL
         SELECT b.id,
                a.[COUNTER],
                a.[refCOUNTER],
                a.id
         FROM dbo.CATSDB A
              INNER JOIN CTE b ON(a.REFCOUNTER = b.[COUNTER])
         WHERE a.id >= b.Findid),
     CTE1
     AS (SELECT id,
                MAX(Findid) Findid
         FROM CTE
         GROUP BY id)

     INSERT INTO #TEMP
     (UpdateID,
      FINDID,
      [COUNTER],
      [REFCOUNTER],
      [PERNR],
      [WORKDATE],
      [CATSHOURS],
      [APDAT],
      [LAETM],
      [CATS_STATUS],
      [APPR_STATUS]
     )
            SELECT c1.ID,
                   c1.FINDID,
                   a.COUNTER,
                   a.REFCOUNTER,
                   a.PERNR,
                   a.WORKDATE,
                   a.CATSHOURS,
                   a.APDAT,
                   a.LAETM,
                   a.CATS_STATUS,
                   a.APPR_STATUS
            FROM dbo.CATSDB A
                 INNER JOIN CTE1 c1 ON a.id = c1.Findid;

BEGIN TRY
    BEGIN TRAN;

    UPDATE A
      SET
          [REFCOUNTER] = b.REFCOUNTER,
          [PERNR] = b.PERNR,
          [WORKDATE] = b.WORKDATE,
          [CATSHOURS] = b.CATSHOURS,
          [APDAT] = b.APDAT,
          [LAETM] = b.LAETM,
          [CATS_STATUS] = b.CATS_STATUS,
          [APPR_STATUS] = b.APPR_STATUS
    FROM CATSDB A
         INNER JOIN #TEMP B ON a.id = b.UpdateID;

    -- this is only test query
    SELECT c1.UpdateID AS UpdateID,
           a.*
    FROM dbo.CATSDB A
         INNER JOIN #TEMP c1 ON a.id = c1.Findid;

    IF(@@trancount > 0)
        ROLLBACK; -- commit
END TRY
BEGIN CATCH
    IF(@@trancount > 0)
        ROLLBACK;
END CATCH;

#Temp should be permanent table.

IMO, your table badly need identity column which should be identity and Clustered Index.

You can try, you can Alter it .

REFCOUNTER,COUNTER should be Non Clustered Index.

After and only after optimizing the query and with proper PLAN above index is going to boost performance.

Proper PLAN : Should you use Recursive or RBAR and update millions of records in one time or Should I Batch update ?

You can first Test the script with millions of row with Rollback.

KumarHarsh
  • 5,046
  • 1
  • 18
  • 22
  • counter would be the primary key. – dk96m Sep 17 '19 at 14:13
  • Catsdb is the source data. I will be pulling from that source weekly to insert or update my table. New records will have a null/empty refcounter field. Updated records will have a refcounter. It could look like this Counter/refcounter 1/NULL, 2/1, 3/2, 4/3. That is all 1 record. Counter 1 was the original, Counter 2 was an update to 1, Counter 2 is now the updated record. Then 3 updates 2 and 3 becomes the updated record. 4 updates 3 and 4 becomes the record. The final record I want to show would be 1 with the updates from 4. – dk96m Sep 17 '19 at 14:16
  • Also, counter 1 and 2 could already be in my table, then next week 3 and 4 come in. Does this make sense? – dk96m Sep 17 '19 at 14:21
  • refcounter may not be null all the time – dk96m Sep 18 '19 at 12:06
  • @dk96m , You didn't say anything about my existing answer.If we talk about only output then I think part of output is correct. Isn;t it. I get back to you with my updated script after your feedback – KumarHarsh Sep 19 '19 at 11:22
  • Ok, I am taking a look at the query, seeing how it works when i use it. The identity column is actually COUNTER. That will be unique. Also, it isnt just where refcounter is null. The initial record will have a null refcounter, but when an update record comes in, that initial record should be updated with everything from the update record except the original counter value. So say the initial counter/refcounter is 1/NULL, then when an update record comes in with counter/refcounter 4/1 then it will update the initial record which will now have a counter/refcounter of 1/1. – dk96m Sep 19 '19 at 16:49
  • if a third update record comes in with counter/refcounter 8/4, then it will update the initial record and will be 1/4. Hope that makes sense. Only the first record will have a null refcounter. Any update after that it won't, unless we dont update the initial record with the refcounter value, just everything else – dk96m Sep 19 '19 at 16:51