0

PreNote: aka washing of hands; this is work being done on a Brownfield Project


I have a "ProductLine" table as follows

| ProductLineID (pk) | ProductID (fk) | ResellerID (fk) | Other stuff |
|--------------------|----------------|-----------------|-------------|
| 1                  | 28             | 298818          |    --       |

The current system has a table of Product Template lines, that creates a set of product lines each time a new reseller is created, linked to that reseller. The idea being that if that Reseller wished to edit the product for their organisation it would be displayed based on their account.

These product lines are used on the sale line table which is linked to a sale table(which is linked to the cart table).

There are a couple of tables hooked up to the product lines for various reasons.

What I was looking at doing was making a de-duped copy of the product lines and dropping some of the data so a new line would only be created IF the reseller made a change; thus reducing the page from > 124,000 rows down to 69 (no ones used the functionality in 5 years).

Then using the old ProductLine table as a reference, altering the existing data (ProductLineId's in the sale line table) to point to the new ProductLineID, by reading the original lines ProductID and finding the new matching LineID (one per product funnily enough).

I was wondering what the best way to do this would be; a cursor springs to mind but tends to bring out DBA's from far and wide with a pitchfork, and I'll probably need to do a similar query on several tables so the less painful the SQL the better.

Just to make the visualisation a little easier the sale line is like this

| SaleLineId (pk) | SaleID (fk) | ProductLineId (fk) | Price |
|-----------------|-------------|--------------------|-------|
| 1992            | 29          | 10283              | 9.00  |

Extra

I plan to rename the old ProductLine table to LegacyProductLine. Then dedupe + insert the product lines from there into a clean ProductLineTable.

I then need to replace the ProductLineId's in the SalesLine (and others) with the new ProductLineId.

The LegacyProductLine wont know what the ProductLineID is in the ProductLineTable; hence I was looking at the ProductID as a way of matching them up as there is no other matching parameters.



    +-----------------+     +-----------------+          +------------------+
    |LegacyProductLine|     | ProductLine     |          |  SaleLine        |
    |-----------------|     |-----------------|          |------------------|
    |ProductLineId PK |     | ProductLineID PK|          | SaleLineId    PK |
    |ProductName      |     | ProductName     |          | ProductLineId FK |
    |... some stuff   |     | ... Some stuff  |          | Charge           |
    |ResellerID  FK   |     |                 |          |                  |
    |ProductID FK     |     | ProductId       |          |                  |
    |                 |     |                 |          |                  |
    |                 |     |                 |          |                  |
    |                 |     |                 |          |                  |
    |                 |     |                 |          |                  |
    |                 |     |                 |          +------------------+
    |                 |     |                 |
    |                 |     |                 |
    |                 |     |                 |
    |                 |     |                 |
    +-----------------+     +-----------------+
     200K rows               26 Rows
     Mostly Duplicates       Deduped Data

The legacy table is temporary only, for reference, and will be deleted. I need to change the ProductLineID in the SaleLine Table.

The SaleLine Table currently contains the ProductLineId's from the Legacy Table; These need updating to use the ProductLineId's in the ProductLine table.

Chris McKee
  • 4,298
  • 10
  • 48
  • 83
  • 1
    You already alluded to NOT using it, but if you are considering a cursor...the same end result can be accomplished using a WHILE loop. NEVER should a cursor be used. :) – Justin Pihony Mar 01 '12 at 16:31
  • Just to clarify, you want an idea on how to do the entire process? As in, how to create the new table, dedupe while creating a mapping, and then remap the relationships with the new id? – Justin Pihony Mar 01 '12 at 17:13
  • No I can de-dupe to a new table. The question is more on how to re-map (change the data) in the SalesLine so it uses the new "master" De-duped ProductLine, considering the only link between the old ProductLine table and the new De-duped ProductLine table is the ProductId. – Chris McKee Mar 01 '12 at 17:23
  • Wait, so you are saying that oldPL.ProductLine = newPL.ProductLine? That is what it sounds like when you say that there is a link between the two on that column. – Justin Pihony Mar 01 '12 at 21:30
  • Wondered where you might of got that from... then I read what I put under extra. I've altered it to clarify. Hope thats better >< sorry. – Chris McKee Mar 01 '12 at 23:07
  • Are you still dealing with this? I got busy and am just returning to this, but looking at your new updates, it looks like what you really need is to make your dedupe process also create a mapping table with old id's to new. If you still need, I can create a query that should make that pretty simple, just let me know :) – Justin Pihony Mar 08 '12 at 03:32
  • Cool, would be nice to tick the answer ;) That and I'd hate to pollute the web with my own answer ;) – Chris McKee Mar 08 '12 at 16:48
  • Updated my answer to be the full answer set for creating a mapping table, de-duping using the mapping table, and then fixing up the sales table with the mapping table. I realized I did not put a ton of comments to explain. Let me know if you need them :) Hopefully, this should make your task very easy now :) – Justin Pihony Mar 08 '12 at 18:43

1 Answers1

1

By the sound of this, I am not sure you even need a loop*. Here is my proposed solution based on the assumption below

When you create your new ProductLine (PL) table with de-duped data, then you will need to create a mapping table from NewPL to OldPL (Map_OldPL_NewPL). This makes the problem trivial:

UPDATE SalesLine
SET PLId = NewPLId
FROM SalesLine
    JOIN Map_OldPL_NewPL AS Map
        ON SalesLine.PLId = OldPLId

However, please clarify my assumption below, as I am guessing you are asking more about how to de-dupe the ProductLine, since this solution is so trivial.

*Assuming that you already have a mechanism in place to create the de-duped productline table. But, maybe that IS what you are asking, in which case, could you please clarify that to keep others from assuming the same :). I will have to expand my answer in that case :)

UPDATE:

Here is the full answer. You could probably do this all within one or two queries, but this way you can always have the mapping table to look back on. I am assuming that it is a duplicate if everything except for the PK (ProductLineId) are the same. If not, then you would need to modify the ROW_NUMBER partition and the following update.

CREATE TABLE DuplicateMapping
(
    OldProductLineId INT, 
    ProductName VARCHAR(MAX), 
    ... , 
    ResellerId INT, 
    ProductId INT
    DuplicateHierarchy INT,
    NewProductLineId INT
)

INSERT INTO DuplicateMapping
SELECT  ProductLineId AS OldProductLineId, ProductName, ... , ResellerId, ProductId, 
    ROW_NUMBER() OVER 
        (PARTITION BY ProductName, 
            ... , ResellerId, ProductId ORDER BY ProductLineId) AS DuplicateHierarchy,
    ProductLineId AS NewProductLineId
FROM ProductLine

UPDATE DuplicateMapping
SET NewProductLineId = Dup.OldProductLine
FROM DuplicateMapping AS Main
    JOIN DuplicateMapping AS Dup
        ON DuplicateMapping.ProductName = Dup.ProductName
            AND DuplicateMapping.ResellerId = Dup.ResellerId
            AND DuplicateMapping.ProductId = Dup.ProductId
            ...
            --Do NOT include OldProductLineId, NewProductLineId or DuplicateHierarchy
WHERE Dup.DuplicateHierarchy = 1

DELETE ProductLine
WHERE EXISTS 
(
    SELECT 1 
    FROM DuplicateMapping
    WHERE DuplicateMapping.ProductLineId = ProductLine.ProductLineId
        AND DuplicateMapping.DuplicateHierarchy > 1
)

UPDATE SaleLine
SET ProductLineId = NewProductLineId
FROM SaleLine
    JOIN DuplicateMapping
        ON ProductLineId = OldProductLineId
--Without this, you would not cause any harm
--Howerver, why update the same value over itself 
WHERE DuplicateHierarchy > 1
Justin Pihony
  • 66,056
  • 18
  • 147
  • 180