4

I am trying to get a count of non-null values for each column in my table. I have researched the following previously asked SO questions, and have not found a satisfactory answer:

Query to list number of records in each table in a database

get a count of each value from every column in a table SQL Server

SQL Server count number of distinct values in each column of a table

I have written the following code to try to build out a data dictionary for my table to include the name of the column, the number of populated rows in each column, the data type, the length, and whether it is a primary key:

SELECT 
    c.name 'Column Name',
    p.rows 'Row_Count',
    t.Name 'Data type',
    c.max_length 'Max Length',
    ISNULL(i.is_primary_key, 0) 'Primary Key'
FROM    
    sys.columns c
INNER JOIN 
    sys.types t ON c.user_type_id = t.user_type_id
LEFT OUTER JOIN 
    sys.index_columns ic ON ic.object_id = c.object_id AND ic.column_id = c.column_id
LEFT OUTER JOIN 
    sys.indexes i ON ic.object_id = i.object_id AND ic.index_id = i.index_id
LEFT OUTER JOIN
sys.partitions p ON p.OBJECT_ID = i.OBJECT_ID and i.index_id = p.index_id
WHERE
    c.object_id = OBJECT_ID('my_table')

However, the Row_Count column returns all Nulls.

My expected outcome looks like this:

Column_Name Row_Count Data_Type Max_Length Primary_Key
A              10       varchar   50            0
B              10       varchar   50            0
C              7        float     50            0
D              3        float     50            0
E              10       varchar   50            0
thedatasleuth
  • 539
  • 4
  • 22
  • 3
    AFAIK, there isn't a system view/function that will tell you how many non-NULLs a column has. You'll have to use dynamic SQL to `COUNT(Column)` for each column. – Steven Hibble May 01 '19 at 18:39
  • How do you want the data? 1 column per column, or 2 columns (Column Name and Count) and a row for every column? – Thom A May 01 '19 at 18:40
  • My expected outcome is the column names in the first column, the count of populated rows in the second column, the type of data in the third, the length of the data in the fourth, and whether it's a primary key in the fifth – thedatasleuth May 01 '19 at 18:44
  • That's really important information, I think you need to actually show us the result set you want in your question. That's far more than *"I am trying to get a count of non-null values for each column in my table"* – Thom A May 01 '19 at 18:46
  • 1
    you can check the nullability from system tables and if the column doesn't allow nulls use the rowcount from metadata but if the table has any columns allowing null you will have to query the table to get the count for those columns – Martin Smith May 01 '19 at 18:56
  • @MartinSmith Can you show how I would write that query to use the rowcount from the meta data? I don't have Nulls in this table – thedatasleuth May 01 '19 at 18:59
  • @thedatasleuth take as look at this answer https://stackoverflow.com/a/2836803/1744834. BTW, I think output format is a bit strange - if don't have nulls you don't have to count number of values for all columns - it will be the same for every column and should be equal number of rows in the table – Roman Pekar May 01 '19 at 19:03
  • @RomanPekar - just tried this one. It looks like it should work but no values are returned. Thoughts? – thedatasleuth May 01 '19 at 19:05
  • @thedatasleuth sorry but I'm not going to guess why it doesn't work, create an example on sqlfiddle or dbfiddle and update the question, and I'm sure somebody will help you – Roman Pekar May 01 '19 at 19:07
  • I *think* I'm almost there with a solution here, but I won't say it's pretty. :/ – Thom A May 01 '19 at 19:09
  • which version of Sql Server do you use btw? – Roman Pekar May 01 '19 at 20:30

2 Answers2

6

Here is one option that does NOT use Dynamic SQL.

Full disclosure, I suspect DS would be more performant. That said, this would work on any virtually any table, view or query. I'm using master..spt_values as a demonstration

Example

Select ColumnName      = Item
      ,B.column_ordinal
      ,Row_Count       = sum(1)
      ,B.system_type_name
      ,B.max_length
      ,Distinct_Values = count(DISTINCT Value)
 From  (
        Select C.*
         From  master..spt_values A
         Cross Apply ( values (cast((Select A.* for XML RAW) as xml))) B(XMLData)
         Cross Apply (
                        Select Item  = replace(xAttr.value('local-name(.)', 'varchar(100)'),'_x0020_',' ')
                              ,Value = xAttr.value('.','varchar(max)')
                         From  XMLData.nodes('//@*') xNode(xAttr)
                     ) C
       ) A
 Left Join  (
        Select * from sys.dm_exec_describe_first_result_set('Select * from master..spt_values',null,null )  
       ) B on A.Item=B.name
 Group By A.Item
         ,B.system_type_name
         ,B.max_length
         ,B.column_ordinal 
 Order By B.column_ordinal 

Returns

enter image description here

EDIT

As Larnu mentioned this will fail with (var)binary and image. Again, this will NOT perform well on a large table. I've only used such approaches during my discovery phase.

John Cappelletti
  • 79,615
  • 7
  • 44
  • 66
3

God this is ugly...

DECLARE @SQL nvarchar(MAX);
DECLARE @Table sysname = 'SampleTable';
DECLARE @Schema sysname = 'dbo';

SET @SQL = N'WITH Counts AS (' + NCHAR(13) + NCHAR(10) + 
           N'    SELECT @Schema AS SchemaName,' + NCHAR(13) + NCHAR(10) +
           N'           @Table AS TableName,' +
           STUFF((SELECT N',' + NCHAR(13) + NCHAR(10) + 
                         N'           COUNT(' + QUOTENAME(C.COLUMN_NAME) + N') AS ' + QUOTENAME(COLUMN_NAME)
                  FROM INFORMATION_SCHEMA.COLUMNS C
                  WHERE C.TABLE_SCHEMA = @Schema
                    AND C.TABLE_NAME = @Table
                  FOR XML PATH(N''),TYPE).value('.','nvarchar(MAX)'),1,14,N'') + NCHAR(13) + NCHAR(10) + 
           N'    FROM ' + QUOTENAME(@Table) + N')' + NCHAR(13) + NCHAR(10) + 
           N'SELECT V.ColumnName,' + NCHAR(13) + NCHAR(10) + 
           N'       V.NonNullCount,' + NCHAR(13) + NCHAR(10) + 
           N'       ISC.DATA_TYPE + ISNULL(DT.S,'''') AS Datatype,' + NCHAR(13) + NCHAR(10) +
           N'       ISNULL(PK.IsPrimaryKey,''No'') AS PrimaryKey' + NCHAR(13) + NCHAR(10) +
           N'FROM Counts C' + NCHAR(13) + NCHAR(10) + 
           N'     CROSS APPLY(VALUES' + STUFF((SELECT N',' + NCHAR(13) + NCHAR(10) + 
                                                      N'                       (N' + QUOTENAME(C.COLUMN_NAME,'''') + N',C.' + QUOTENAME(C.COLUMN_NAME) + N')'
                                               FROM INFORMATION_SCHEMA.COLUMNS C
                                               WHERE C.TABLE_NAME = @Table
                                               FOR XML PATH(N''),TYPE).value('.','nvarchar(MAX)'),1,26,N'') + N')V(ColumnName,NonNullCount)' + NCHAR(13) + NCHAR(10) +
           N'     JOIN INFORMATION_SCHEMA.COLUMNS ISC ON C.SchemaName = ISC.TABLE_SCHEMA' + NCHAR(13) + NCHAR(10) +
           N'                                        AND C.TableName = ISC.TABLE_NAME' + NCHAR(13) + NCHAR(10) +
           N'                                        AND V.ColumnName = ISC.COLUMN_NAME' + NCHAR(13) + NCHAR(10) + 
           N'     CROSS APPLY (VALUES(''('' + STUFF(CONCAT('','' + CASE ISC.CHARACTER_MAXIMUM_LENGTH WHEN -1 THEN ''MAX'' ELSE CONVERT(varchar(4),ISC.CHARACTER_MAXIMUM_LENGTH) END,' + NCHAR(13) + NCHAR(10)+
           N'                                            '','' + CASE WHEN ISC.DATA_TYPE NOT LIKE ''%int'' THEN CONVERT(varchar(4),ISC.NUMERIC_PRECISION) END,' + NCHAR(13) + NCHAR(10) +
           N'                                            '','' + CASE WHEN ISC.DATA_TYPE NOT LIKE ''%int'' THEN CONVERT(varchar(4),ISC.NUMERIC_SCALE) END,' + NCHAR(13) + NCHAR(10) +
           N'                                            '','' + CONVERT(varchar(4),ISC.DATETIME_PRECISION)),1,1,'''') + '')'')) DT(S)' + NCHAR(13) + NCHAR(10) +
           N'     OUTER APPLY(SELECT ''Yes'' AS IsPrimaryKey ' + NCHAR(13) + NCHAR(10) + 
           N'                 FROM INFORMATION_SCHEMA.TABLE_CONSTRAINTS TC' + NCHAR(13) + NCHAR(10) + 
           N'                      JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE KCU ON TC.TABLE_SCHEMA = KCU.TABLE_SCHEMA' + NCHAR(13) + NCHAR(10) + 
           N'                                                                  AND TC.TABLE_NAME = KCU.TABLE_NAME' + NCHAR(13) + NCHAR(10) + 
           N'                                                                  AND TC.CONSTRAINT_NAME = KCU.CONSTRAINT_NAME' + NCHAR(13) + NCHAR(10) + 
           N'                 WHERE TC.CONSTRAINT_TYPE = ''PRIMARY KEY''' + NCHAR(13) + NCHAR(10) + 
           N'                   AND KCU.COLUMN_NAME = V.ColumnName' + NCHAR(13) + NCHAR(10) + 
           N'                   AND TC.TABLE_SCHEMA = ISC.TABLE_SCHEMA' + NCHAR(13) + NCHAR(10) + 
           N'                   AND TC.TABLE_NAME = ISC.TABLE_NAME) PK;';

PRINT @SQL; --Might need to use SELECT here
--SELECT @SQL;
EXEC sp_executesql @SQL, N'@Schema sysname,@Table sysname',@Schema = @Schema, @Table = @Table;

db<>fiddle

Honestly, there is a lot going on here. If it needs an explanation, I will try, but it will take some time, so (and no offence), i'm not going to put the effort in if no one is going to want/need to know.

One note, I have been slightly lazy and no joined on SCHEMA_NAME. If you are using multiple schemas, with same named objects, this will have problem, and will need to be addressed.

Edit: apparently I'm a glutton for punishment. Fixed the Schema "problem" and added some logic in regards to ints

New Fiddle

Thom A
  • 88,727
  • 11
  • 45
  • 75