SELECT DISTINCT with composite key

Question

This is a simplification of the problem I have, but using this example I want to get all the product records for a given customer. Give the customer may have many orders and some orders may be for the same product I don't want to pull back duplicate products. The product table also has a composite key on it.

In an ideal world this would be simple and the following Query would do the job. However as the table contains an nText column, SQL Server complains that "The ntext data type cannot be selected as DISTINCT because it is not comparable."

SELECT distinct p.idA, p.idB, p.descriptionNTEXT
FROM p product, o order, c customer
WHERE o.productID = p.idA AND o.subProductID = p.idB AND o.customerID = c.ID

I would like to re-structure this to something like

SELECT p1.idA, p1.idB, p1.descriptionNTEXT
FROM   p1 product
WHERE (p1.idA, p1.idB) IN
   (
       SELECT p.idA, p.idB
       FROM p product, o order, c customer
       WHERE o.productID = p.idA AND o.subProductID = p.idB AND o.customerID = c.ID
   )

See How do I (or can I) SELECT DISTINCT on multiple columns?

However this approach does not work on SQL Server

Any Ideas?

Ideally I need a portable solution, something that will at least work on SQLServer, MySQL & Oracle. But DB Specific solutions are better than nothing!

The following data shows roughly what I want.

Customer
ID name
1  Fred

Order
ID  CustomerID productID subProductID
10  1          100       200
11  1          100       200
12  1          100       200
13  1          101       201

Product
IDA  IDB descriptionNTEXT
100  200 'product 1'
101  201 'product 2'    
102  203 'product 3'    

Expected Result
IDA  IDB descriptionNTEXT
100  200 'product 1'
101  201 'product 2'

Add some sample table data and the expected result - as well formatted text. — jarlh, Nov 14 '16 at 11:00

David דודו Markovitz · Answer 1 · 2016-11-14T11:22:05.967

6

Cast NTEXT to NVARCHAR(MAX)

SELECT distinct p.idA, p.idB, cast (p.descriptionNTEXT as nvarchar(max))
FROM p product, o order, c customer
WHERE o.productID = p.idA AND o.subProductID = p.idB AND o.customerID = c.ID

IMPORTANT! ntext, text, and image data types will be removed in a future version of SQL Server. Avoid using these data types in new development work, and plan to modify applications that currently use them. Use nvarchar(max), varchar(max), and varbinary(max) instead.

https://msdn.microsoft.com/en-us/library/ms187993.aspx

nvarchar [ ( n | max ) ] Variable-length Unicode string data. n defines the string length and can be a value from 1 through 4,000. max indicates that the maximum storage size is 2^31-1 bytes (2 GB). The storage size, in bytes, is two times the actual length of data entered + 2 bytes. The ISO synonyms for nvarchar are national char varying and national character varying.

https://msdn.microsoft.com/en-us/library/ms186939.aspx

edited Nov 14 '16 at 11:22

answered Nov 14 '16 at 11:07

David דודו Markovitz

42,900
6
64
88

Thanks, I guess that would truncate the description if was over 4000/8000 chars on older SQL versions? It also needs to work with blobs. Unfortunately avoiding the types isn't an option as this is for a tool that will operate on 3rd party databases. But good to know there being replaced, they were always a source of problems and incompatibilities. – Sprotty Nov 14 '16 at 11:20
@Sprotty, nvarchar(max) limitation is 2G (see edited answer). For BLOB use varbinary(max) https://msdn.microsoft.com/en-us/library/ms188362.aspx – David דודו Markovitz Nov 14 '16 at 11:24
is the uniqueness given by any unique IDA IDB combination? That seems a good way of establishing a distinct row - your example SQL seemed to be on sensible lines, except you can't use IN for two fields - but you can switch to EXISTS (which will work for zero, 1 or more fields) - then you perhaps don't need to worry about converting BLOB, ntext etc - using the product description to determine a unique might not be the best method. – Cato Nov 14 '16 at 11:40
1

@Sprotty _"for a tool that will operate on 3rd party databases"_ - `SELECT DISTINCT SEVERAL_GIGABYTES`? The problem is not with types. The problem is with solution. You have to find different way to solve your task. – Ivan Starostin Nov 14 '16 at 11:59
@Ivan - I think he might be 1 millimetre from a better solution - description is probably unique for any unique idA and idB - he simply wants to list the unqiue idA and idB combinations – Cato Nov 14 '16 at 12:04
Apparently the IN (idA,idB) syntax is good on Postgresql (but not SQLSever). But looking at the answer from @Cata, that seems to re-write it in a form that looks pretty vanilla and functional looks to be exactly what I want. – Sprotty Nov 14 '16 at 12:52

score 2 · Accepted Answer · answered Nov 14 '16 at 11:18

2

looking at your code, you seem to want to use IN over multiple fields - this can be achieved via EXISTS - it asks if the specified query returns any row - so based on your query it might be something like

SELECT p1.idA, p1.idB, p1.descriptionNTEXT
FROM   p1 product
WHERE EXISTS
   (
       SELECT 1
       FROM p product, o order, c customer
       WHERE o.productID = p.idA AND o.subProductID = p.idB AND o.customerID = c.ID 
             AND p1.idA = p.ida 
             AND p1.idB = p.idb
   )

answered Nov 14 '16 at 11:18

Cato

3,652
9
12

Thanks, this looks like a clean re-factoring of the original query, and seems to be vanilla SQL that would not put a huge overhead on the query. – Sprotty Nov 14 '16 at 13:01
Not exactly my use case but the idea of `WHERE EXISTS( SELECT 1 ...)` led me into the right direction. Very smart :) – ferdynator Sep 21 '17 at 13:04

George Sovetov · Answer 3 · 2016-11-14T13:36:58.693

0

First do distinct, then join to products.

select p.*
    from (
        select distinct CustomerID, productID, subProductID
        from [Order]
    ) do
    join Product p on p.productID = do.IDA and p.subProductID = do.IDB

edited Nov 14 '16 at 13:36

answered Nov 14 '16 at 11:12

George Sovetov

4,942
5
36
57

I thought that looks like a plausible solution, although syntax may not be SQL SERVER - however question mentions DBs in general - maybe could have done with a comment when downvoted? – Cato Nov 14 '16 at 11:53
@Cato I don't really expect that this exact query would work in SQL Server but I believe that it should work in principle. Which parts of my query would not work with SQL Server? – George Sovetov Nov 14 '16 at 12:41
maybe the 'Product p on (p.productID, p.subProductID) = (do.IDA, do.IDB)' part, or it could be I just never saw that before or it is newer SQL version – Cato Nov 14 '16 at 13:27
@Cato Indeed, tuple comparison is not supported in MS SQL Server as I learnt after googling. I fixed query. – George Sovetov Nov 14 '16 at 13:39

score 0 · Answer 4 · answered Nov 14 '16 at 12:09

0

You can try this without converting it to another type

SELECT p1.idA, p1.idB, p1.descriptionNTEXT
from product p1 where p1.IDA in( SELECT Distinct p.idA FROM  product p,order o,customer c WHERE o.productID = p.idA AND o.subProductID = p.idB AND o.customerID = c.ID)

answered Nov 14 '16 at 12:09

mansi

837
5
12

This looks like it would work with the limited test data I provided, but would not work as expected if IDA is repeated with different IDB values. – Sprotty Nov 14 '16 at 12:57

SELECT DISTINCT with composite key

4 Answers4