0

Consider following table:

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Product](
    [ProductID] [int] IDENTITY(1,1) NOT NULL,
    [ProductCategory] [int] NOT NULL,
    [ProductCategoryGuid] [uniqueidentifier] NULL,
 CONSTRAINT [PK_Product] PRIMARY KEY CLUSTERED 
(
    [ProductID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO
SET IDENTITY_INSERT [dbo].[Product] ON 

GO
INSERT [dbo].[Product] ([ProductID], [ProductCategory], [ProductCategoryGuid]) VALUES (1, 2, NULL)
GO
INSERT [dbo].[Product] ([ProductID], [ProductCategory], [ProductCategoryGuid]) VALUES (2, 2, NULL)
GO
INSERT [dbo].[Product] ([ProductID], [ProductCategory], [ProductCategoryGuid]) VALUES (3, 2, NULL)
GO
INSERT [dbo].[Product] ([ProductID], [ProductCategory], [ProductCategoryGuid]) VALUES (4, 3, NULL)
GO
INSERT [dbo].[Product] ([ProductID], [ProductCategory], [ProductCategoryGuid]) VALUES (5, 4, NULL)
GO
INSERT [dbo].[Product] ([ProductID], [ProductCategory], [ProductCategoryGuid]) VALUES (6, 2, NULL)
GO
INSERT [dbo].[Product] ([ProductID], [ProductCategory], [ProductCategoryGuid]) VALUES (7, 3, NULL)
GO
INSERT [dbo].[Product] ([ProductID], [ProductCategory], [ProductCategoryGuid]) VALUES (8, 4, NULL)
GO
INSERT [dbo].[Product] ([ProductID], [ProductCategory], [ProductCategoryGuid]) VALUES (9, 4, NULL)
GO
SET IDENTITY_INSERT [dbo].[Product] OFF
GO

Data looks like this:

ProductID   ProductCategory ProductCategoryGuid
1            2                NULL
2            2                NULL
3            2                NULL
4            3                NULL
5            4                NULL
6            2                NULL
7            3                NULL
8            4                NULL
9            4                NULL

What I would like to achieve is to update [ProductCategoryGuid] column so that all rows with the same value for [ProductCategory] will have the same Guid value in [ProductCategoryGuid] column

To clarify:

Guid values will be generated as a part of the UPDATE query using NEWID() function

Rows with ProductID IN( 1, 2, 3, 6) will have Guid1

Rows with ProductID IN( 4,7) will have Guid2

Rows with ProductID IN( 5,8,9) will have Guid3

Joe Schmoe
  • 1,574
  • 4
  • 23
  • 48
  • This would be a job you should do in the productcategory-table. If you don't have one, create one. – Solarflare Jun 04 '16 at 20:19
  • There is no ProductCategory table. I understand this is a bit counter-intuitive but actual task is more complicated and this is the best way to present it for SO. I will try to clarify: Guid values will be generated as a part of the query using NEWID() function Rows with ProductID IN( 1, 2, 3, 6) will have Guid1 Rows with ProductID IN( 4,7) will have Guid2 Rows with ProductID IN( 5,8,9) will have Guid3 – Joe Schmoe Jun 04 '16 at 20:29
  • Derivative of Bogdan's answer: UPDATE p SET p.ProductCategoryGuid=pc.GD FROM dbo.Product p INNER JOIN (SELECT DISTINCT p2.ProductCategory, NEWID() AS GD FROM dbo.Product p2) AS pc ON p.ProductCategory=pc.ProductCategory – Joe Schmoe Jun 04 '16 at 21:05
  • @JoeSchmoe: I'm not positive sure that this derivated solution is 100% safe. – Bogdan Sahlean Jun 04 '16 at 21:12
  • Logical order of execution of SELECT clauses and options tells DBMS to generate results thus: compute expresions from SELECT clause and the execute DISTINCT. So it's very likely that for every row from dbo.Product will be generated a "new" GUID (NEWID()) and the DISTINCT will try to remove duplicated rows. At this point it higle unlikely that will be duplicated rows because of NEWID(). – Bogdan Sahlean Jun 04 '16 at 21:16
  • I would sau that If derivated solution works then you are lucky. Sry. – Bogdan Sahlean Jun 04 '16 at 21:17
  • Yes, derivative solution I posted works – Joe Schmoe Jun 04 '16 at 21:41

2 Answers2

2

I would use following script that use a table variable to store list of distinct categories. The same table variable has a GUID column having as default value NEWID() function. At the end of the script there is an UPDATE statement using table variable as source and dbo.Product table as target:

DECLARE @Results TABLE (
    [ProductCategory] [int] NOT NULL,
    [ProductCategoryGuid] [uniqueidentifier] NOT NULL DEFAULT (NEWID())
)

INSERT  @Results (ProductCategory)
SELECT  DISTINCT p.ProductCategory
FROM    dbo.Product p

UPDATE  p 
SET     ProductCategoryGuid = r.ProductCategoryGuid
OUTPUT  deleted.ProductCategoryGuid, inserted.ProductCategoryGuid
FROM    dbo.Product p
INNER JOIN @Results r ON p.ProductCategory = r.ProductCategory

Comment OUTPUT clause if you don't want to see the old and new values.

Update: one statement solution (it requires SQL2012+)

;WITH CteUpdateProduct
AS (
    SELECT *, FIRST_VALUE(NewGUID) OVER(PARTITION BY ProductCategory ORDER BY ProductID) AS NewProductCategoryGuid
    FROM (
        SELECT  p.*, NEWID() AS NewGUID
        FROM    dbo.Product p
    ) x
)
UPDATE  CteUpdateProduct
SET     ProductCategoryGuid = NewProductCategoryGuid
OUTPUT  inserted.ProductID, inserted.ProductCategory, inserted.ProductCategoryGuid;
Bogdan Sahlean
  • 19,233
  • 3
  • 42
  • 57
  • Thanks, this works. Now purely for academic reasons I am curious if this can be done within a single update statement. – Joe Schmoe Jun 04 '16 at 20:35
  • The main issue with one statement solution is when with those categories with multiple rows. In this case, SQL Server will "request" for the same categ two or more times associated GUID. This means that for the same categ the `NEWID()` function could be executed twice or more resulting in diff. GUIDs for the same categ. To materialize these GUIDs could be used solution described here: https://stackoverflow.com/questions/13090037/t-sql-cte-materializing-techniques-not-working-on-sql-server-2012/13098687#13098687 – Bogdan Sahlean Jun 04 '16 at 21:03
  • @JoeSchmoe: see second solution albeit somebody could argue isn't 100% safe. – Bogdan Sahlean Jun 04 '16 at 21:11
1
WITH productCategories as (
  SELECT DISTINCT ProductCategory 
  FROM product
), productCategoriesWithGuid as (
  SELECT ProductCategory, NEWID() ProductCategoryGuid
  From productCategories
) 
UPDATE product 
SET ProductCategoryGuid = pc.ProductCategoryGuid
FROM Product p
JOIN productCategoriesWithGuid pc on p.ProductCategory = pc.ProductCategory

This query gets the distinct ProductCategories,

Creates a GUID for each of them,

And lastly updates the product table with the GUIDs

All in one statement.

PeterO
  • 171
  • 8