This is sort of a follow up question I asked here: SQL Server better way to iterate through millions of rows. I was told in that thread to submit a new question because what I really was looking for was different than the way I asked it.
I was not initially clear in my OP that this procedure will be run periodically. So this will not be run one time, it will be run weekly and the procedure needs to be able to check the counter table and the temp table.
I am working with SAP Timesheet data, so there are millions of rows. What I am trying to do is select the data from the SAP table and insert it into a table on MS SQL Server.
So I want to insert the original record(if it doesnt already exist), then if an update to the original record happens, which is in the form of a new SAP record with a refcounter
, I want to find the original record in my table and update it, keeping the original counter value.
So I have done this successfully with a cursor (I know not the best), but with millions of records, I am wondering if there is a faster way, because I am on day 4 of my cursor running. Is there a better way then what I have below:
BEGIN
CREATE TABLE CATSDB
(
[COUNTER] nvarchar(12),
REFCOUNTER nvarchar(12),
PERNR nvarchar(8),
WORKDATE nvarchar(8),
CATSHOURS decimal(7, 3),
APDAT nvarchar(8),
LAETM nvarchar(6),
CATS_STATUS nvarchar(2),
APPR_STATUS nvarchar(2)
)
INSERT INTO CATSDB
(
[COUNTER],REFCOUNTER,PERNR,WORKDATE,CATSHOURS,APDAT,LAETM,CATS_STATUS,APPR_STATUS
)
VALUES
('000421692670',NULL,'00000071','20190114','6.00','20190204','174541','30','30'),
('000421692671',NULL,'00000071','20190114','3.00','20190204','174541','30','30'),
('000421692672',NULL,'00000071','20190115','6.00','00000000','000000','60','20'),
('000421692673',NULL,'00000071','20190115','3.00','00000000','000000','60','20'),
('000421692712','000421692672','00000071','20190115','0.00','20190115','111007','30','30'),
('000421692713','000421692673','00000071','20190115','0.00','20190115','111007','30','30'),
('000429718015',NULL,'00000072','20190313','7.00','00000000','000000','60','20'),
('000429718016',NULL,'00000072','20190313','1.50','20190315','164659','30','30'),
('000429718017',NULL,'00000072','20190313','1.00','20190315','164659','30','30'),
('000430154143',NULL,'00000072','20190313','2.00','00000000','000000','60','20'),
('000430154142','000429718015','00000072','20190313','5.00','00000000','000000','60','20'),
('000430154928','000430154142','00000072','20190313','4.50','20190315','164659','30','30'),
('000430154929','000430154143','00000072','20190313','2.50','20190315','164659','30','30'),
('000429774620',NULL,'00000152','20190314','1.00','00000000','000000','60','20'),
('000429774619',NULL,'00000152','20190314','1.00','00000000','000000','60','20'),
('000429802106','000429774620','00000152','20190314','2.00','00000000','000000','60','20'),
('000429802105','000429774619','00000152','20190314','3.00','00000000','000000','60','20'),
('000429840242','000429802106','00000152','20190314','4.00','20190315','143857','30','30'),
('000429840241','000429802105','00000152','20190314','5.00','20190315','143857','30','30')
CREATE TABLE [TBL_COUNTER]
(
[COUNTER] [varchar](12) NOT NULL,
[REFCOUNTER] [varchar](12) NULL
)
CREATE TABLE TEMP
(
[COUNTER] [nvarchar](12) NOT NULL,
[REFCOUNTER] [nvarchar](12) NULL,
[PERNR] [nvarchar](8) NULL,
[WORKDATE] [nvarchar](8) NULL,
[CATSHOURS] [decimal](7, 3) NULL,
[APDAT] [nvarchar](8) NULL,
[LAETM] [nvarchar](6) NULL,
[CATS_STATUS] [nvarchar](2) NULL,
[APPR_STATUS] [nvarchar](2) NULL
)
END
BEGIN
DECLARE @COUNTER nvarchar(12),
@REFCOUNTER nvarchar(12),
@PERNR nvarchar(8),
@WORKDATE nvarchar(8),
@CATSHOURS decimal(7, 3),
@APDAT nvarchar(8),
@LAETM nvarchar(6),
@CATS_STATUS nvarchar(2),
@APPR_STATUS nvarchar(2)
DECLARE @orig_counter nvarchar(12)
END
--The below cursor will include a where statement where it selects data
--where the workdate or apdat fall within a data range. Since I am
--running this weekly, the to and from dates will be the last work week.
BEGIN
DECLARE curs CURSOR FOR
SELECT
[COUNTER],
REFCOUNTER,
PERNR,
WORKDATE,
CATSHOURS,
APDAT,
LAETM,
CATS_STATUS,
APPR_STATUS
FROM
CATSDB
END
BEGIN
OPEN curs
END
BEGIN
FETCH NEXT FROM curs INTO
@COUNTER,
@REFCOUNTER,
@PERNR,
@WORKDATE,
@CATSHOURS,
@APDAT,
@LAETM,
@CATS_STATUS,
@APPR_STATUS
END
BEGIN
WHILE @@FETCH_STATUS = 0
BEGIN
BEGIN
IF NOT EXISTS (SELECT * FROM TBL_COUNTER WHERE [COUNTER] = @COUNTER)
BEGIN
INSERT INTO TBL_COUNTER
([COUNTER]
,REFCOUNTER)
VALUES
(@COUNTER
,@REFCOUNTER)
END
END
BEGIN
IF NOT EXISTS (SELECT * FROM TEMP WHERE [COUNTER] = @COUNTER)
BEGIN
--If REFCOUNTER is populated, get the original COUNTER value, then update that row with the new values. Otherwise insert new record
IF @REFCOUNTER <> '' AND @REFCOUNTER IS NOT NULL
BEGIN
BEGIN
WITH n([COUNTER], REFCOUNTER) AS
(
SELECT
cnt.[COUNTER],
cnt.REFCOUNTER
FROM
TBL_COUNTER cnt
WHERE
cnt.[COUNTER] = @REFCOUNTER
UNION ALL
SELECT
nplus1.[COUNTER],
nplus1.REFCOUNTER
FROM
TBL_COUNTER as nplus1,
n
WHERE
n.[COUNTER] = nplus1.REFCOUNTER
)
SELECT @orig_counter = [COUNTER] FROM n WHERE REFCOUNTER = '' OR REFCOUNTER IS NULL
END
BEGIN
UPDATE TEMP
SET
[REFCOUNTER] = @REFCOUNTER
,[PERNR] = @PERNR
,[WORKDATE] = @WORKDATE
,[CATSHOURS] = @CATSHOURS
,[APDAT] = @APDAT
,[LAETM] = @LAETM
,[CATS_STATUS] = @CATS_STATUS
,[APPR_STATUS] = @APPR_STATUS
WHERE [COUNTER] = @orig_counter
END
END
ELSE
BEGIN
INSERT INTO TEMP
([COUNTER]
,[REFCOUNTER]
,[PERNR]
,[WORKDATE]
,[CATSHOURS]
,[APDAT]
,[LAETM]
,[CATS_STATUS]
,[APPR_STATUS])
VALUES
(@COUNTER
,@REFCOUNTER
,@PERNR
,@WORKDATE
,@CATSHOURS
,@APDAT
,@LAETM
,@CATS_STATUS
,@APPR_STATUS)
END
END
FETCH NEXT FROM curs INTO
@COUNTER,
@REFCOUNTER,
@PERNR,
@WORKDATE,
@CATSHOURS,
@APDAT,
@LAETM,
@CATS_STATUS,
@APPR_STATUS
END
END
END
BEGIN
CLOSE curs
DEALLOCATE curs
END
I shortened it and created the tables for you all to be able to see what is going on. The expected result is
+--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+
| COUNTER | REFCOUNTER | PERNR | WORKDATE | CATSHOURS | APDAT | LAETM | CATS_STATUS | APPR_STATUS |
+--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+
| 000421692670 | NULL | 00000071 | 20190114 | 6.00 | 20190204 | 174541 | 30 | 30 |
| 000421692671 | NULL | 00000071 | 20190114 | 3.00 | 20190204 | 174541 | 30 | 30 |
| 000421692672 | 000421692672 | 00000071 | 20190115 | 0.00 | 20190115 | 111007 | 30 | 30 |
| 000421692673 | 000421692673 | 00000071 | 20190115 | 0.00 | 20190115 | 111007 | 30 | 30 |
| 000429718015 | 000430154142 | 00000072 | 20190313 | 4.50 | 20190315 | 164659 | 30 | 30 |
| 000429718016 | NULL | 00000072 | 20190313 | 1.50 | 20190315 | 164659 | 30 | 30 |
| 000429718017 | NULL | 00000072 | 20190313 | 1.0 | 20190315 | 164659 | 30 | 30 |
| 000430154143 | 000430154143 | 00000072 | 20190313 | 2.50 | 20190315 | 164659 | 30 | 30 |
| 000429774620 | 000429774620 | 00000152 | 20190314 | 2.00 | 00000000 | 000000 | 60 | 20 |
| 000429774619 | 000429802105 | 00000152 | 20190314 | 5.00 | 20190315 | 143857 | 30 | 30 |
+--------------+--------------+----------+----------+-----------+----------+--------+-------------+-------------+
I will pull the data from the origin source for new records and changed records from the last time i ran it. So I will not have the full chain every week. There needs to be a way to get back to the original counter value, without the full dataset, which is why i had the counter table. I apologize for not being more clear.
I hope what I am trying to do is clear. As i stated, i have it working with a cursor, but it is so slow. Processing a week work of data is taking at least 24 hours.