0

Suppose database.tbl contains two fields: fruit and price.

  • Row 1: fruit='apple', price=0.60.
  • Row 2: fruit='pear', price=0.60.
  • Row 3: fruit='peach', price=0.50.

I want to run a query which returns the count of distinct values for each field, i.e. it should return

  • fruit 3
  • price 2

because there are 3 fruits, but only 2 distinct prices.

I've tried

SELECT C.COLUMN_NAME, COUNT(DISTINCT C.COLUMN_NAME) FROM tbl JOIN (SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.COLUMNS  WHERE TABLE_SCHEMA='database' AND TABLE_NAME='tbl') AS C GROUP BY C.COLUMN_NAME;

but this returns:

  • fruit 1
  • price 1
oks
  • 304
  • 2
  • 13
  • It's only returning 1 because you're counting the column name, not the data in the column. Does this have to work without knowing what columns you want the count for? – Evan OJack Feb 09 '15 at 20:12
  • Thanks. Yes, it has to work without knowing the columns in advance. All it knows is that it must count for all columns. – oks Feb 09 '15 at 20:14
  • Is it acceptable to use a stored procedure to generate the results? – Evan OJack Feb 09 '15 at 20:21
  • Yes, a stored procedure would be fine. – oks Feb 09 '15 at 20:27

1 Answers1

1

The only way I know to do it is to take the column names from the INFORMATION_SCHEMA query and put them in a temp table. Then iterate over the column names in the temp table and execute dynamic sql to query the distinct values in those columns. Here is an example in T-SQL, so you'll have to make some changes.

CREATE TABLE #values
(
    COLUMN_NAME VARCHAR(100),
    DISTINCT_COUNT INT
)

DECLARE @column AS VARCHAR(100) = ''

SELECT COLUMN_NAME INTO #columns FROM INFORMATION_SCHEMA.COLUMNS  WHERE TABLE_SCHEMA='dbo' AND TABLE_NAME='tbl'

SELECT  TOP 1@column = COLUMN_NAME
FROM    #columns
WHERE   COLUMN_NAME > @column
ORDER BY COLUMN_NAME ASC

WHILE (@@ROWCOUNT > 0)
BEGIN
    DECLARE @sql AS NVARCHAR(1000) = N'INSERT INTO #values (COLUMN_NAME, DISTINCT_COUNT) SELECT ''' + @column + ''', COUNT(DISTINCT ' + @column + N') FROM [db].[dbo].[tbl]'
    EXECUTE sp_executesql @sql

    SELECT  TOP 1 @column = COLUMN_NAME
    FROM    #columns
    WHERE   COLUMN_NAME > @column
    ORDER BY COLUMN_NAME ASC
END

SELECT * FROM #values

DROP TABLE #columns
DROP TABLE #values

This is pretty rough and just an example of how it can be done. If this were production code, I'd make sure the variable types are all consistent and make sure the column names are safe. This will help: How To have Dynamic SQL in MySQL Stored Procedure

Community
  • 1
  • 1
Evan OJack
  • 567
  • 2
  • 5