1

I'd like to create a dynamic select that returns every distinct value for each column in a wide table. I.e.

select distinct @mycolumn 
from @mytable

for every column and the results combined to a single table.

Edit1:

Example: Example

Edit2: The order of the returned data won't matter, and the source table can have all sorts of data types.

Any advice appreciated, thank you!

gofr1
  • 15,741
  • 11
  • 42
  • 52
GetStacked
  • 29
  • 1
  • 5
  • 2
    Won't your columns' data types differ? – Heinzi Jun 08 '16 at 11:02
  • 3
    Show example data and desired results. – Martin Smith Jun 08 '16 at 11:04
  • 2
    I can't imagine any scenario where that would be useful. How would you combine them anyway? I'm guessing every column have a different number of distinct values.... – Zohar Peled Jun 08 '16 at 11:04
  • Yes the data types will differ. – GetStacked Jun 08 '16 at 11:04
  • Show us sample table data, and the expected result! – jarlh Jun 08 '16 at 11:05
  • I'm assuming the order of the data doesn't matter, since there can realistically be no relation between the distinct values of one column vs. the next? – Kahn Jun 08 '16 at 11:27
  • @Kahn that is correct – GetStacked Jun 08 '16 at 11:29
  • what happens in the "date" column is precise to a millisecond? do you still want that level of precision for that column? or a float data type, how "precise" should the "distinct values" be? Wll you apply this logic to long text strings also? 200 characters, 2000 characters, more? should you restrict the list by data types perhaps? what is the purpose of the expected result? – Paul Maxwell Jun 08 '16 at 11:41
  • @Used_By_Already the purpose of the expected result is to be able to showcase the contents of each column in a list type format. Some of the columns, such as the float and datetime columns are irrelevant in this particular case, and they would be filtered out from the query. – GetStacked Jun 08 '16 at 11:48
  • Why not just do separate "select distinct" queries on each column you want to showcase, and then paste the results next to each other into a spreadsheet? Since there's no relation between the values anyway, I think you won't be able to come up with any SQL that will do all of them simultaneously. – ADyson Jun 08 '16 at 12:09
  • 1
    This problem is needlessly hard if you demand the result be returned as a single table, because in SQL terms such a table really doesn't make any sense (there's zero relation between the rows). My advice is to not do this, but instead use a single table per column. You can return this to the client as a series of result sets, and the client can then stuff the result in a single table (if desired). T-SQL has no "just append this column to this table" operation like Excel does. – Jeroen Mostert Jun 08 '16 at 12:10
  • Actually @JeroenMostert, I think it's a cool problem. Business-wise it makes no sense, true. And this is the kind of stuff that should be cleaned and parsed in the front app, excel or whatever. But it's still a great opportunity to do customized stuff with SQL Server simply because you can. :) Keep in mind, we don't know the business requirements for this person so we simply don't know enough to judge at this point. – Kahn Jun 08 '16 at 12:41

2 Answers2

2

The only way I can think of is very cumbersome and probably extremely slow: Using a Tally table (I've generated one using a recursive cte for the sake of this answer, but that's also not a very good way to do that...) and multiple derived tables left joined to that tally table I was able to come up with something that will generate the desired output.
However, as I wrote on the top - it's very cumbersome and probably extremely slow (I've tested only on a table with 5 columns and 6 rows so I have no idea about execution speed).

DECLARE @Count int
select  @Count = COUNT(1) 
FROM YourTable

;with tally as (
    select 1 as n
    union all 
    select n + 1
    from tally 
    where n < @Count 
)

SELECT Column1, Column2, Column3, Column4, Column5
FROM tally 
LEFT JOIN 
(
    SELECT Column1, ROW_NUMBER() OVER (ORDER BY Column1) rn
    FROM 
    (
        SELECT DISTINCT Column1
        FROM YourTable 
    ) t1
) d1 ON(n = d1.rn)
LEFT JOIN 
(
    SELECT Column2, ROW_NUMBER() OVER (ORDER BY Column2) rn
    FROM 
    (
        SELECT DISTINCT Column2
        FROM YourTable 
    ) t1
) d2 ON(n = d2.rn)
LEFT JOIN 
(
    SELECT Column3, ROW_NUMBER() OVER (ORDER BY Column3) rn
    FROM 
    (
        SELECT DISTINCT Column3
        FROM YourTable 
    ) t1
) d3 ON(n = d3.rn)
LEFT JOIN 
(
    SELECT Column4, ROW_NUMBER() OVER (ORDER BY Column4) rn
    FROM 
    (
        SELECT DISTINCT Column4
        FROM YourTable 
    ) t1
) d4 ON(n = d4.rn)
LEFT JOIN 
(
    SELECT Column5, ROW_NUMBER() OVER (ORDER BY Column5) rn
    FROM 
    (
        SELECT DISTINCT Column5
        FROM YourTable 
    ) t1
) d5 ON(n = d5.rn)

Dynamic version:

DECLARE @TableName sysname = 'YourTableName'

DECLARE @Sql nvarchar(max) = 
'
DECLARE @Count int
select  @Count = COUNT(1) 
FROM '+  @TableName +'

;with tally as (
    select 1 as n
    union all 
    select n + 1
    from tally 
    where n < @Count 
)

SELECT '

SELECT @Sql = @Sql + Column_Name +','
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE TABLE_NAME = @TableName

SELECT @Sql = LEFT(@Sql, LEN(@Sql) - 1) + ' FROM tally t'

SELECT @Sql = @Sql + ' LEFT JOIN (SELECT '+ Column_Name +', ROW_NUMBER() OVER (ORDER BY ' + Column_Name +') rn
  FROM 
  (
    SELECT DISTINCT '+ Column_Name +' FROM '+ @TableName +') t
  ) c_'+ Column_Name + ' ON(n =  c_'+ Column_Name + '.rn)'
FROM INFORMATION_SCHEMA.COLUMNS 
WHERE TABLE_NAME = @TableName

EXEC(@Sql)

Update

Tested on a table with 22 columns and 47,000 rows, my suggestion took 46 seconds when using a proper tally table. on Sql server 2014. I was surprised - I thought it would take at least 2-3 minutes.

Community
  • 1
  • 1
Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
  • You're almost there. Now do it dynamically. :-P – Jeroen Mostert Jun 08 '16 at 12:15
  • Turning this into a dynamic sql is not that hard, but I doubt it will be any better then the dynamic solution proposed by Kahn in the other answer. – Zohar Peled Jun 08 '16 at 12:32
  • @JeroenMostert: Challenge excepted. – Zohar Peled Jun 08 '16 at 12:58
  • Alright, now do it in a single query. `FOR XML` should help you out here... actually, I'm not serious. This is bad enough as it is. (The problem, I mean, not your solution.) – Jeroen Mostert Jun 08 '16 at 13:02
  • Ooh, great idea. I love the whole recursive CTE thing for dynamically managing the row numbers. One tiny point tho, EXEC @sql will fail with strings over 8k characters long, which could become an issue with a wide table. That's why I used sp_executesql instead. :) – Kahn Jun 08 '16 at 13:11
  • 1
    @Kahn - There is no such limitation with `exec` – Martin Smith Jun 08 '16 at 15:47
  • Ok well this is awkward. I can't find any info on the limitation. Yet I'm certain that about a year ago I made a dynamic trigger batch script that would build and execute a string over 8000 characters in length, and had to switch to sp_executesql because exec wouldn't take it. But it seems my memory must be wrong. Nevermind then. :) – Kahn Jun 09 '16 at 09:17
1

Here's a dynamic set I was working on. I'm running out of time so it's not cleaned up, and it determines the dynamic row numbers by the max number of rows in the table as a whole, meaning that if you have any duplicates in any column at all, you'll be left with rows where every single column is null.

But other than that, this should work perfectly fine, and the script contains the necessary info showing you how to concatenate a final "WHERE S1.COLNAME IS NOT NULL AND S2.COLNAME IS NOT NULL AND .." filter to the result table, to eliminate those full-null rows.

Other than that, here's the script. It's gonna be heavy, obviously, so I included a (nolock) hint in it, and a "WHERE ColName is not null" to remove useless results.

Try this on a smaller table and see it work.

/*
Set your table and schema on @MYTABLE and @MYSCHEMA variables.
*/
SET NOCOUNT ON

DECLARE @MYTABLE SYSNAME = 'Mytablename here'
    , @MYSCHEMA sysname = 'dbo'

DECLARE @SQL NVARCHAR(MAX) = '', @COLNAME sysname = '', @MYCOLS NVARCHAR(max) = ''

DECLARE @COL_NOW INT = 1, @COL_MAX INT = 
    (SELECT COUNT(*) 
    FROM sys.columns
    WHERE object_id = (SELECT object_id FROM sys.tables where name = @MYTABLE and SCHEMA_NAME(schema_id) = @MYSCHEMA))

SELECT @COLNAME = name 
    FROM sys.columns 
    WHERE column_id = 1
    and object_id = (SELECT object_id FROM sys.tables where name = @MYTABLE and SCHEMA_NAME(schema_id) = @MYSCHEMA)

SET @SQL = 'FROM 
    (SELECT ROW_NUMBER() OVER (ORDER BY '+@COLNAME+' ASC) RN
    FROM '+@MYSCHEMA+'.'+@MYTABLE+' (nolock)) S'

WHILE @COL_NOW <= @COL_MAX
BEGIN

    SELECT @COLNAME = name 
    FROM sys.columns 
    WHERE column_id = @COL_NOW 
    and object_id = (SELECT object_id FROM sys.tables where name = @MYTABLE and SCHEMA_NAME(schema_id) = @MYSCHEMA)

    SELECT @SQL = @SQL+'
FULL JOIN 
    (SELECT DISTINCT DENSE_RANK() OVER (ORDER BY '+@COLNAME+' ASC) RN, '+@COLNAME+'
    FROM '+@MYSCHEMA+'.'+@MYTABLE+' (nolock) 
    WHERE '+@COLNAME+' IS NOT NULL) S'+CAST(@COL_NOW AS NVARCHAR(25))+' ON S'+CAST(@COL_NOW AS NVARCHAR(25))+'.RN = S.RN'

    IF @COL_NOW = 1
        SELECT @MYCOLS = @MYCOLS+' S'+CAST(@COL_NOW AS NVARCHAR(25))+'.'+@COLNAME           
    ELSE
        SELECT @MYCOLS = @MYCOLS+', S'+CAST(@COL_NOW AS NVARCHAR(25))+'.'+@COLNAME

    SET @COL_NOW = @COL_NOW+1

END

SELECT @SQL = 'SELECT'+@MYCOLS+'
'+@SQL+'
ORDER BY S1.RN ASC';

--PRINT(@SQL); -- To check resulting dynamic SQL without executing it (Warning, print will only show first 8k characters)

EXEC sp_executesql @SQL;
GO
Kahn
  • 1,630
  • 1
  • 13
  • 23
  • Is there a way to exclude specific column names? I couldn't work it out without getting dynamic SQL errors. – PeterX Jun 14 '22 at 00:45