Finding Common Sets of data by comparing sub-details

Question

I have a table which details the sales details, the product and quantity with some other details, shown below.

SalesOrderID    ProductID   OrderQty
           1          762          1
           1          758          1
           2          760          1
           3          710          3
           3          773          1
           3          778          1
           3          775          1
           4          747          1
           5          710          1
           5          709          2
           5          773          2
           5          776          1
           6          762          1
           6          758          1

I am trying to mark / group sales orders which are same i.e. they have the same list of ProductID with the same OrderQty. For example, in the above case, SalesOrderID with 1 and 6 are for instance the same.

I have, so far, tried to group the data by the combination of ProductID & OrderQty and then tried to sum over SalesOrderID, but clearly this is not a reliable solution.

With TempView as (
  Select SalesOrderID, ProductID, OrderQty,
    DENSE_RANK() OVER(ORDER BY ProductID, OrderQty) as 'RANK_P_W_Q'
    FROM SalesRecords
), SumBySalesID as (
  Select SalesOrderID, ProductID, OrderQty, SUM(RANK_P_W_Q)
    OVER(PARTITION BY SalesOrderID ORDER BY SalesOrderID 
     ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as 'SalesMarker'
  FROM TempView
) Select SalesOrderID, ProductID, OrderQty,
    DENSE_RANK() OVER(ORDER BY SalesMarker) as 'SalesGroup'
    FROM SumBySalesID

The next solution would be to pivot, the table along ProductID and then it would be a straight forward, to group by SalesOrderID, but, I cannot rely on this stratergy because, Pivot would usually result to have more than 1024 columns, which is the limit for normal tables.I know about wide-tables but I don't want to go that route as I have to manage the data length along the rows, by 8,096 Bytes and I would like to keep the script as simple as possible.

I hope somebody could point me in the right direction.

what version of mssql? – Juan Carlos Oropeza Sep 17 '19 at 18:35 — Juan Carlos Oropeza, Sep 17 '19 at 18:35

Juan Carlos Oropeza · Accepted Answer · 2019-09-17T18:52:29.100

In MSSQL 2017 just use STRING_AGG function, 2016 check here how emulate it.

SQL DEMO

First create a key for each order with products and quantities.

SELECT [SalesOrderID]
     , STRING_AGG (CAST([ProductID] as nvarchar(100))  
                 + '-' 
                 + CAST([OrderQty] as nvarchar(100)), ',')  
                 WITHIN GROUP (ORDER BY [ProductID] ASC) AS csv  
FROM Sales
GROUP BY [SalesOrderID];

OUTPUT

| SalesOrderID |                     csv |
|--------------|-------------------------|
|            1 |             762-1,758-1 |
|            2 |                   760-1 |
|            3 | 710-3,773-1,778-1,775-1 |
|            4 |                   747-1 |
|            5 | 710-1,709-2,773-2,776-1 |
|            6 |             762-1,758-1 |

Then group each key together to find duplicates:

with cte as (
  SELECT [SalesOrderID]
       , STRING_AGG (CAST([ProductID] as nvarchar(100))  
                   + '-' 
                   + CAST([OrderQty] as nvarchar(100)), ',')  
                   WITHIN GROUP (ORDER BY [ProductID] ASC) AS csv
  FROM Sales
  GROUP BY [SalesOrderID]
) 
SELECT *, rank() over (ORDER BY csv) as grp
FROM CTE

OUTPUT

| SalesOrderID |                     csv | grp |
|--------------|-------------------------|-----|
|            5 | 710-1,709-2,773-2,776-1 |   1 |
|            3 | 710-3,773-1,778-1,775-1 |   2 |
|            4 |                   747-1 |   3 |
|            2 |                   760-1 |   4 |
|            1 |             762-1,758-1 |   5 |
|            6 |             762-1,758-1 |   5 |

Finding Common Sets of data by comparing sub-details

1 Answers1