I have two tables and I need to be able to update them going both ways, the first table is a list of items:
ITEMS
Item* | Rev* | RDate | ECO | New
------+------+--------------+------+----
A | 0A | 2019-01-01 | E123 | 1
A | 01 | 2018-01-01 | E456 | 0
B | 0A | 2018-12-31 | E765 | 0
C | 01 | 2018-10-25 | E456 | 0
The second is a parent-child table, with revisions, but I have to fill in the Child Rev from the Item table
Tree
Parent* | ParentRev* | Child* | ChildRev | VDate*
--------+------------+--------+----------+-----------
Y | 0B | C | NULL | 2019-01-01
Y | 0C | D | NULL | 2019-01-13
Z | 01 | A | NULL | 2018-06-25
Z | 02 | A | NULL | 2019-01-11
Z | 0A | B | NULL | 2019-01-01
Notes:
- Primary key columns are marked with *
VDate
shouldn't be part of the primary key, but the dataset is bad and has duplicates, so I need to add it
I looked into different questions like Select first row in each GROUP BY group?, but I couldn't find one that used row based conditions on the joining table that returned multiple fields. Anyway, I'm using this to fill in the records where ChildRev is NULL
, but it doesn't include the ECO
column
UPDATE T
SET [ChildRev] = (SELECT TOP 1 I.[Rev] AS [ChildRev]
FROM [Items] AS I
WHERE (I.[Item] = T.[Child]
AND I.[RDate] <= T.[VDate])
ORDER BY I.[RDate] DESC
)
FROM [Tree] AS T
WHERE T.[ChildRev] IS NULL
And, this is what I get:
Parent | ParentRev | Child | ChildRev | VDate | ECO
-------+-----------+-------+----------+------------+------
Y | 0B | C | 01 | 2019-01-01 | NULL
Y | 0C | D | NULL | 2019-01-13 | NULL
Z | 01 | A | 01 | 2018-06-25 | NULL
Z | 02 | A | 0A | 2019-01-11 | NULL
Z | 0A | B | 0A | 2019-01-01 | NULL
I'm dealing with 4.5M+ records in the Tree table and 1.2M+ in the Item table, growing daily. I have 2 questions:
Is there a better (faster) way to update the
Tree
Table? (Bonus if it includes theECO
)When I add new
Items
, they are flagged with a1
in theNew
field (might use trigger)How would I Check/Update the
Tree
table with the newItems
Mind you that I have no real control in what order the data will get loaded (table or date).
Update
So, apparently Select first row in each GROUP BY group? was basically the solution, I just didn't realize it. Specifically on how to use a CTE to Update my data tables. Thanks @Xedni for enlightening me; I've only really used CTEs for recursive queries. So, I ended up with 2 similar CTEs,
When I add new records to the
Tree
table, I addedAND ChildRev IS NULL
to limit the updates:WITH CTE AS ( SELECT ... ) UPDATE CTE SET ChildRev = ItemRev WHERE RID = 1 AND ChildRev IS NULL
When I add new records to the
Materials
table, I added aWHERE...ANY
clause:WITH CTE AS ( SELECT ... RID = ROW_NUMBER() OVER (PARTITION BY t.Parent, t.ParentRev, t.Child ORDER BY i.RDate DESC) FROM #Tree t JOIN #Items i ON t.Child = i.Item AND i.RDate <= t.VDate WHERE I.Process = ANY (SELECT Item FROM #Items WHERE New = 1) ) UPDATE CTE SET ChildRev = ItemRev WHERE RID = 1