2

I am trying to, using SSIS, obtain a table to get the percentage of missing values of every field in every table of a SQL Server database.

Ideally I would like to create a new table in another database with 4 fields

Table / Field / Type / PctMissingValues

with one row for each field of a table. Type would be the field type

My idea was to use foreach loop containers to loop through tables and fields, with inside the container a Data flow task consisting of OLE DB Source > Aggregate > OLE DB Destination but I can't figure out how to do this, I am new to SSIS and ETLs in general

Hadi
  • 36,233
  • 13
  • 65
  • 124
fmarm
  • 4,209
  • 1
  • 17
  • 29
  • I don't have answer for you, but here some links that might help. Each of them look like they will operate on just one table. So you would have to query all your tables from sys.tables within your container or you could use a cursor and do it all within a query and not use SSIS at all. https://dba.stackexchange.com/questions/14864/test-if-any-columns-are-null https://blogs.msdn.microsoft.com/samlester/2012/09/22/tsql-solve-it-your-way-finding-the-percentage-of-null-values-for-each-column-in-a-table/ – Isaac Oct 24 '19 at 02:09
  • Why SSIS? Personally if I wanted to do this I'd just go down the route of using dynamic SQL, that way you can check all the columns in a single table with one query which should be faster. – BarneyL Oct 24 '19 at 07:12
  • @BarneyL Thank you for your comment.Never heard of dynamic SQL, I will look into it. I want to use SSIS because my client specifically asked us to use it to solve this problem – fmarm Oct 24 '19 at 10:33
  • @Isaac Thank you, the second link seems promising I will look at it more closely – fmarm Oct 24 '19 at 10:34
  • @Isaac - What do you mean exactly by "missing values" is this NULL field values, empty strings or something that specifically represents unknown (like -1 in a data warehouse often does)? – BarneyL Oct 24 '19 at 14:09
  • @BarneyL I am assuming your comment about "missing values" was intended for the OP, not me. – Isaac Oct 24 '19 at 15:20
  • 1
    @Isaac, yes. Just misread from the thread. Was intended for fmarm. – BarneyL Oct 24 '19 at 15:24

2 Answers2

2

The following SQL query generates one query per column in a database that counts total rows and rows where the value is NULL.

You can load this in to a variable and loop through it in SSIS running the statement in each row one at a time and logging the results form that query out to another table.

SELECT 
 OBJECT_SCHEMA_NAME(C.object_id) AS TableSchema
,OBJECT_NAME(C.object_id) AS TableName
,C.name AS ColumnName
,'SELECT COUNT(*) AS TotalRows, COUNT(IIF([' +C.name+ '] IS NULL,1,NULL)) AS NullRows 
    FROM [' + OBJECT_SCHEMA_NAME(C.object_id) + '].[' + OBJECT_NAME(C.object_id) + ']' AS CountQuery
FROM sys.columns AS C
INNER JOIN sys.tables AS T
    ON C.object_id = T.object_id
BarneyL
  • 1,332
  • 8
  • 15
1

You can use SSIS data profiling task to get the NULL value ratio in a table. There are many links online for a step-by-step guide:

Hadi
  • 36,233
  • 13
  • 65
  • 124