0

I have two tables and I need to be able to update them going both ways, the first table is a list of items:

ITEMS

Item* | Rev* | RDate        | ECO  | New
------+------+--------------+------+----
A     | 0A   | 2019-01-01   | E123 | 1
A     | 01   | 2018-01-01   | E456 | 0
B     | 0A   | 2018-12-31   | E765 | 0
C     | 01   | 2018-10-25   | E456 | 0

The second is a parent-child table, with revisions, but I have to fill in the Child Rev from the Item table

Tree

Parent* | ParentRev* | Child* | ChildRev | VDate*
--------+------------+--------+----------+-----------
Y       | 0B         | C      | NULL     | 2019-01-01
Y       | 0C         | D      | NULL     | 2019-01-13
Z       | 01         | A      | NULL     | 2018-06-25
Z       | 02         | A      | NULL     | 2019-01-11
Z       | 0A         | B      | NULL     | 2019-01-01

Notes:

  • Primary key columns are marked with *
  • VDate shouldn't be part of the primary key, but the dataset is bad and has duplicates, so I need to add it

I looked into different questions like Select first row in each GROUP BY group?, but I couldn't find one that used row based conditions on the joining table that returned multiple fields. Anyway, I'm using this to fill in the records where ChildRev is NULL, but it doesn't include the ECO column

UPDATE T 
SET [ChildRev] = (SELECT TOP 1 I.[Rev] AS [ChildRev]
                  FROM [Items] AS I
                  WHERE (I.[Item] = T.[Child]
                    AND I.[RDate] <= T.[VDate]) 
                  ORDER BY I.[RDate] DESC
                 )
FROM [Tree] AS T
WHERE T.[ChildRev] IS NULL

And, this is what I get:

Parent | ParentRev | Child | ChildRev | VDate      | ECO
-------+-----------+-------+----------+------------+------
Y      | 0B        | C     | 01       | 2019-01-01 | NULL
Y      | 0C        | D     | NULL     | 2019-01-13 | NULL
Z      | 01        | A     | 01       | 2018-06-25 | NULL
Z      | 02        | A     | 0A       | 2019-01-11 | NULL
Z      | 0A        | B     | 0A       | 2019-01-01 | NULL

I'm dealing with 4.5M+ records in the Tree table and 1.2M+ in the Item table, growing daily. I have 2 questions:

  1. Is there a better (faster) way to update the Tree Table? (Bonus if it includes the ECO)

    When I add new Items, they are flagged with a 1 in the New field (might use trigger)

  2. How would I Check/Update the Tree table with the new Items

Mind you that I have no real control in what order the data will get loaded (table or date).


Update

So, apparently Select first row in each GROUP BY group? was basically the solution, I just didn't realize it. Specifically on how to use a CTE to Update my data tables. Thanks @Xedni for enlightening me; I've only really used CTEs for recursive queries. So, I ended up with 2 similar CTEs,

  1. When I add new records to the Tree table, I added AND ChildRev IS NULL to limit the updates:

    WITH CTE AS
    (
        SELECT ...
    )
    UPDATE CTE
    SET ChildRev = ItemRev
    WHERE RID = 1
      AND ChildRev IS NULL
    
  2. When I add new records to the Materials table, I added a WHERE...ANY clause:

    WITH CTE AS
    (
        SELECT 
            ...
            RID = ROW_NUMBER() OVER (PARTITION BY t.Parent, t.ParentRev, t.Child 
                                     ORDER BY i.RDate DESC)
        FROM #Tree t
        JOIN #Items i
          ON t.Child = i.Item
         AND i.RDate <= t.VDate
        WHERE I.Process = ANY (SELECT Item FROM #Items WHERE New = 1)
    )
    UPDATE CTE
    SET ChildRev = ItemRev
    WHERE RID = 1
    
Community
  • 1
  • 1
Profex
  • 1,370
  • 8
  • 20
  • Am I correct in assuming your first table listed is `[Items]` and your second table listed is `[Tree]`? Also what are the primary keys of the different tables? – Xedni Jan 15 '19 at 19:15
  • Yes, I'll fix it. Primary keys are `Item | Rev` for `Items` table and `Parent | ParentRev | Child | VDate` for the `Tree` table – Profex Jan 15 '19 at 19:43
  • I.[Item] = T.[Child] . D does not exist in both tables, so in essence the answer really is null, am i missing something here – MichaelEvanchik Jan 15 '19 at 19:51
  • @MichaelEvanchik, For the most part, both `Parent` & `Child` come from the `Item` table. – Profex Jan 15 '19 at 20:10

1 Answers1

1

Instead of using a correlated subquery in the UPDATE clause, you could get the values you need via a join. First, create a derived table that looks almost identical to your correlated subquery, and get whatever unique values you need to identify the rows from #Items you want to associated with rows in #Tree. Since thereh isn't any indication of unique constraints on the tables mentioned, I had to sort of guess at this.

Setting up Sample Data

-- Setting up sample data
if object_id('tempdb.dbo.#Items') is not null drop table #Items
create table #Items
(
    Item char(1),
    Rev char(2),
    RDate date,
    ECO char(4),
    New bit
)

insert into #Items (Item, Rev, RDate, ECO, New)
values 
    ('A', '0A', '2019-01-01', 'E123', 1),
    ('A', '01', '2018-01-01', 'E456', 0),
    ('B', '0A', '2018-12-31', 'E765', 0),
    ('C', '01', '2019-01-01', 'E456', 0)

if object_id('tempdb.dbo.#Tree') is not null drop table #Tree
create table #Tree
(
    Parent char(1),
    ParentRev char(2),
    Child char(1),
    ChildRev char(2),
    VDate date,
    ECO char(4)
)
insert into #Tree (Parent, ParentRev, Child, ChildRev, VDate)
values
    ('Y', '0B', 'C', NULL, '2019-01-01'),
    ('Y', '0C', 'D', NULL, '2019-01-13'),
    ('Z', '01', 'A', NULL, '2018-06-25'),
    ('Z', '02', 'A', NULL, '2019-01-11'),
    ('Z', '0A', 'B', NULL, '2019-01-01')

Now that you have that derived table mapping the rows in #tree to the rows with the dates you want from #items, join that once more to the #items table to get ECO, Rev, and whatever else you want.

-- Actual Update Statement
update a
set ChildRev = c.Rev,
    Eco = c.Eco
from #Tree a
-- Consruct a derived table basically mapping the rows in #tree to the rows with the desired dates you want.
inner join 
(
    select t.Child, t.ParentRev, MaxRDate = max(i.RDate)
    from #Tree t
    inner join #Items i
        on t.Child = i.Item
            and i.RDate <= t.VDate
    group by t.Child, t.ParentRev
) b
    on a.Child = b.Child
        and a.ParentRev = b.ParentRev
-- Finally, join the "intermidate mapping table" to #Items to get the values (eco, rev, etc.) you actually want
inner join #Items c
    on b.Child = c.Item
        and b.MaxRDate = c.RDate

select top 1000 *
from #Tree

Generally speaking this is probably going to perform better than a correlated subquery, although depending on what indexes exist your milage may differ. Also, if you're truely running through 4.5 million records like this, consider breaking it into batches, or figuring out a way you can pre-filter the stuff you need to update ahead of time.

As for kicking this process off when a new row comes in you have two options.

  1. In whatever procedure inserts the data that sets the new flag, have it kick off this process at the same time (or something similar to do both within the same transaction).
  2. If that's not an option, you could in theory do the same thing with a trigger on the Items table, kicking off this process as needed. Although TBH I'd recommend the former as it's much easier to contain all the logic you need in the same place, and not have the extra overhead of having a trigger, which also somewhat obfuscates the process of keeping the data in sync.

Another Alternative

Another approach I just worked out is to do everything in a single query. Use a CTE (or derived table; whichever you wish) with a row_number RID. Then update that where RID = 1

;with src as
(
    select 
        t.Parent,
        t.ParentRev,
        t.Child,
        t.ChildRev, 
        t.VDate,
        t.ECO,
        Item = i.Item,
        ItemRev = i.Rev,
        ItemRDate = i.RDate,
        ItemECO = i.ECO,
        ItemNew = i.NEW,
        RID = row_number() over (partition by t.Parent, t.ParentRev, t.Child order by i.RDate desc)
    from #Tree t
    inner join #Items i
        on t.Child = i.Item
            and i.RDate <= t.VDate
)
update src
set ECO = ItemECO,
    ChildREv = ItemRev
where RID = 1
Xedni
  • 3,662
  • 2
  • 16
  • 27