How should count and distinct be used together?

Question

In SQL Server, I have a database mydb, and a table dbo.mytable in it.

I want to see if three columns (Contract_Number, Payment_Number, Task_Number) of the table can form a candidate key, by https://stackoverflow.com/a/34468508/156458

 SELECT count (DISTINCT [Contract_Number], [Payment_Number], [Task_Number])
 FROM [mydb].[dbo].[mytable]

but the execution gives an error:

Msg 102, Level 15, State 1, Line 1
Incorrect syntax near ','.

I was wondering what is going wrong?

Without count, it works fine. How should count and distinct be used together?

I also tried:

  SELECT count(*)
  FROM (SELECT DISTINCT [Contract_Number], [Payment_Number], [Task_Number]
  FROM [PAD_Dev].[dbo].[Line_Level_Custom])

but the execution gives an error:

  Msg 102, Level 15, State 1, Line 3
  Incorrect syntax near ')'.

Thanks.

What are you trying to achieve here? Outside of a aggregate function, `DISTINCT` effects the dataset; every single *row* must have a distinct set of values. Aggregate functions, like `COUNT`, expect a **scalar** expression to `COUNT`, which `[Contract_Number], [Payment_Number], [Task_Number]` is not. Perhaps you want something like `COUNT(DISTINCT CONCAT([Contract_Number], [Payment_Number], [Task_Number]))` or `COUNT(DISTINCT CONCAT_WS(',',[Contract_Number], [Payment_Number], [Task_Number]))`? — Thom A, May 27 '22 at 16:15
https://stackoverflow.com/a/34468508/156458 uses count directly outside distinct — Tim, May 27 '22 at 16:18
SQL Server doesn't support tuples of multiple columns there. Not sure if other RDBMS do or if that was just pseudo code — Martin Smith, May 27 '22 at 16:20
How I would do this in practice depends on what else I need to determine in the same query. If I just wanted the scalar value I would probably do `SELECT COUNT(*) FROM (SELECT DISTINCT [Contract_Number], [Payment_Number], [Task_Number] FROM [mydb].[dbo].[mytable]) X ` — Martin Smith, May 27 '22 at 16:24
That isn't the code I posted. You are missing the required alias — Martin Smith, May 27 '22 at 16:39
`SELECT [Contract_Number], [Payment_Number], [Task_Number], COUNT(*) FROM [mydb].[dbo].[mytable] GROUP BY [Contract_Number], [Payment_Number], [Task_Number] HAVING COUNT(*) > 1` — shawnt00, May 27 '22 at 17:15
You might find a scenario where the query is ambiguous without a alias. Mostly though it's just required syntax. Anonymous types is a newish trend in modern languages and SQL is quite old too...? — shawnt00, May 27 '22 at 17:34
@shawn could you tell me what scenarios? What's anonymous type? — Tim, May 27 '22 at 18:03
I was trying to imagine a scenario whether the syntax might be ambiguous and thus the rule became that it was required. I don't have an example and ultimately it's a minor irritation we've always had. As for "anonymous types" there are languages like C++, C#, and Java where you can make a new type on the fly (aka "a table" as a rough parallel) without giving it a name. This is all just my attempt at historical perspective though. — shawnt00, May 27 '22 at 19:05

score 0 · Answer 1 · edited May 28 '22 at 10:45

Why don't you play with Row_Number ?

;with CTE as
(
SELECT row_number()over(partition  by [Contract_Number] order by [Contract_Number]) rownum
,[Contract_Number], [Payment_Number], [Task_Number])
 FROM [mydb].[dbo].[mytable]
 )
 select * from CTE where rownum=1

OR Count function with partition

SELECT
COUNT(*) OVER (PARTITION BY [Contract_Number]) [Contract_Number],
[Payment_Number], [Task_Number]
FROM [mydb].[dbo].[mytable]

*Query is not tested

How should count and distinct be used together?

1 Answers1