0

I need to find all columns that have 5 or more distinct values. Now my query is like:

 SELECT TABLE_NAME,COLUMN_NAME, DATA_TYPE
 FROM INFORMATION_SCHEMA.COLUMNS
 WHERE TABLE_SCHEMA = 'MY_SCHEMA' 
 AND   TABLE_NAME IN ('TABLE_1', 'TABLE_2', 'TABLE_3')

I thought it could be done like simple subquery. Something like:

*code above*
AND (select count(distinct COLUMN_NAME) FROM TABLE_SCHEMA + TABLE_NAME) > 5

I just recently started to learn SQL and thought this kind of thing is easy, but still I can't figure out right query.

Ville
  • 57
  • 9
  • 1
    The INFORMATION_SCHEMA tables contain _meta data_, e.g. information about table and column names, data types etc. – jarlh Nov 21 '22 at 13:07
  • 2
    SQL queries are compiled, they cannot dynamaically refer to schema objects at run time; you will need to first build a query using the table metadata and then execute it with `exec` or `sp_executesql` – Stu Nov 21 '22 at 13:15

2 Answers2

0

With help of Stu's answer and this answer I was able to make workable solution.

declare @RowsToProcess  int
declare @CurrentRow     int
declare @SelectCol     nvarchar(max)
declare @SelectTable     nvarchar(max)

declare @tablesAndColumns table(RowID int not null primary key identity(1,1), table_name nvarchar(max), column_name nvarchar(max)

insert into @tablesAndColumns  
select TABLE_NAME,COLUMN_NAME,DATA_TYPE
from INFORMATION_SCHEMA.COLUMNS
where TABLE_SCHEMA = 'my schema' 
and   TABLE_NAME in ('myTable', 'myTable2' ,'myTable3')

set @RowsToProcess=@@ROWCOUNT

set @CurrentRow=0
while @CurrentRow<@RowsToProcess
begin
    set @CurrentRow=@CurrentRow+1
    select 
        @SelectCol=column_name,
        @SelectTable=table_name
        from @tablesAndColumns
        where RowID=@CurrentRow

    declare @QRY NVARCHAR(MAX)
    set @QRY = ' insert into [my_schema].[result_table] (table_name,column_name,distinct_values) 
                 SELECT ' + '''' +@SelectTable+ '''' + ', ' +  '''' +@SelectCol+ ''''   +   ', count(*) as cnt  
                 FROM (SELECT DISTINCT ' +@SelectCol+ ' FROM my_schema.'+ @SelectTable+') as a'
    exec SP_EXECUTESQL @QRY

end
Ville
  • 57
  • 9
0

I'd like to propose another way. You can run through all the column and table names by using a CURSOR. That way you don't need to store them beforehand and can directly access them in your loop while also having a while condition.

Also I went with sys.tables and sys.columns since I noticed that INFORMATION_SCHEMA also contains views and sys.tables can be filtered for the table's type.

I added a "HAVING COUNT(*) >= 5" into the dynamic SQL so I don't save those informations in the first place rather than filtering them later.

Finally I went with "(NOLOCK)" because you only try to acces the tables for reading and that way you don't lock them for other users / interactions.

(The @i and @max are just for tracking the progress since I ran the query on ~10k columns and just wanted to see how far it is.)

Hopefully might be helpful aswell although you seem to have solved your problem.

DECLARE @columnName nvarchar(100),
    @tableName nvarchar(100),
    @sql nvarchar(MAX),
    @i int = 0,
    @max int = (SELECT COUNT(*)
                FROM sys.tables T
                INNER JOIN sys.columns C ON T.object_id = C.object_id
                WHERE T.[type] = 'U')

DROP TABLE IF EXISTS #resultTable

CREATE TABLE #resultTable (ColumnName nvarchar(100), TableName nvarchar(100), ResultCount int)

DECLARE db_cursor CURSOR FOR
SELECT C.[name], T.[name]
FROM sys.tables T
INNER JOIN sys.columns C ON T.object_id = C.object_id
WHERE T.[type] = 'U'

OPEN db_cursor  
FETCH NEXT FROM db_cursor INTO @columnName, @tableName

WHILE @@FETCH_STATUS = 0  
BEGIN  
    SET @sql = CONCAT(' INSERT INTO #resultTable (ColumnName, TableName, ResultCount)
                        SELECT ''', @columnName, ''', ''', @tableName, ''', COUNT(*)
                        FROM (
                            SELECT DISTINCT [', @columnName, ']
                            FROM [', @tableName, '] (NOLOCK)
                            WHERE [', @columnName, '] IS NOT NULL
                        ) t
                        HAVING COUNT(*) >= 5')

    EXEC sp_executesql @sql

    SET @i = @i + 1
    PRINT CONCAT(@i, ' / ', @max)

    FETCH NEXT FROM db_cursor INTO @columnName, @tableName
END 

CLOSE db_cursor  
DEALLOCATE db_cursor 

SELECT *
FROM #resultTable
Dmitrij
  • 1
  • 2