-1

Is there a way to get a count of distinct values from every table and column in SQL Server.

I've tried using a cursor for this, but that seems insufficient.

Dale K
  • 25,246
  • 15
  • 42
  • 71
  • 2
    Show what you actually tried, and give some more details about why that was _insufficient_ – Luuk Feb 20 '20 at 15:30
  • 1
    Why would you need such a thing? There is no efficient way of doing this. You are literally going to examine every single column of every single table. You could use dynamic sql that is built using sys.tables and sys.columns but even so this is going to be horrifically slow. – Sean Lange Feb 20 '20 at 15:44
  • Try a little searching first. One example for a single table is [here](https://stackoverflow.com/questions/12625729/sql-server-count-number-of-distinct-values-in-each-column-of-a-table) – SMor Feb 20 '20 at 15:56

1 Answers1

1

I've got to agree with Sean and say that this is going to be horrifically slow, but if you really want to do it, then I'm not going to stop you.

Something like this could be used as a starting point if you specifically don't want to use a cursor. This took just under a minute to look at a small database I've got with 10 tables in it. The largest table has just a few million rows in it. No matter what, you're going to be doing some sort of iteration, whether that's a cursor or explicitly reading against the table for each column.

Also, if you want to do something like this, you'll likely need to accommodate for things... like you're not going to be able to use COUNT on xml columns. Like I said, it's a starting point.

DECLARE @cmd VARCHAR(MAX)

SELECT @cmd = 
    STUFF (
        (
            SELECT 
                ' union SELECT ''['+ SCHEMA_NAME(st.schema_id) + '].[' + st.name +']'' as [Object], ''[' + sc.name + ']'' as [Column], COUNT(distinct [' + sc.name + ']) as [Count] FROM [' + SCHEMA_NAME(st.schema_id) + '].[' + st.name + ']'
            FROM sys.tables st
            JOIN sys.columns sc
                ON sc.object_id = st.object_id
            JOIN sys.dm_db_partition_stats ddps
                ON ddps.object_id = sc.object_id
            WHERE 
                ddps.row_count > 0
            FOR XML PATH('')
        ),1,6,''
    ) 

EXECUTE (@cmd) 
GreyOrGray
  • 1,575
  • 8
  • 14
  • Selecting from `INFORMATION_SCHEMA.COLUMNS` will make this slightly more readable. – Luuk Feb 20 '20 at 16:16
  • @Luuk I would disagree that using INFORMATION_SCHEMA.COLUMNS makes this more readable. It would however make this less accurate because the schema in those views is not reliable. https://learn.microsoft.com/en-us/sql/relational-databases/system-information-schema-views/columns-transact-sql?view=sql-server-ver15 – Sean Lange Feb 20 '20 at 16:23
  • I do not see a remark about the schema being unreliable in those docs... – Luuk Feb 20 '20 at 16:26
  • @Luuk did you read it at all? Look at the Description under TABLE_SCHEMA. It has big bold letters that say "** Important ** Do not use INFORMATION_SCHEMA views to determine the schema of an object. The only reliable way to find the schema of a object is to query the sys.objects catalog view." Those views exist for backwards compatibility and are not maintained. Here is a great article that goes deeper into this. https://sqlblog.org/2011/11/03/the-case-against-information_schema-views – Sean Lange Feb 20 '20 at 16:33
  • i tried to run the above code, but get the error Msg 8117, Level 16, State 1, Line 1 Operand data type xml is invalid for count operator. – Juan Vasquez Feb 20 '20 at 16:50
  • @SeanLange: i was thinking this was a correct description: ["The INFORMATION_SCHEMA set of views are the ANSI/ISO standard catalogs for metadata"](https://stackoverflow.com/questions/46187767/difference-between-information-schema-vs-sys-tables-in-sql-server.) – Luuk Feb 20 '20 at 17:25
  • @Juan this was specifically mentioned in the answer that it will not work for count. – Sean Lange Feb 20 '20 at 17:39
  • ok, so maybe i asked this incorrectly i'm just trying to get every unique value for every column in every table. does that make sense ? i'm not even sure why they want this, this came from our vendor.... – Juan Vasquez Feb 21 '20 at 20:13
  • actually, i looked thru the examples above, and this works: – Juan Vasquez Feb 21 '20 at 20:28
  • Ok, i got the above code to work, my question, this is the important part, how can i do this for every column in every table, without having to manually put the table name in ? can that be done with a cursor ? – Juan Vasquez Feb 21 '20 at 20:36