So here is my problem. I am creating a "mapping" document to make data analysis for conversion from an unfamiliar database easier. I copy this into an Excel document for ease of use. I've got the standard INFORMATION_SCHEMA stuff like column name, data_type, etc... but I need a sampling of data to make mapping decisions easier.
Items the code needs:
A sample of what is in the column for every column in X schema (CONV in my case).
I solved this in the code: If it has all nulls = "All Nulls", if it has a single unique value = "Unique Value: [value here]", and if it has many values "Multiple Values: [comma separated quoted list goes here] and [count distinct - 3 if there are more than 3 distinct items] more."
The problem:
The very nature of this is that it runs slowly. Is there anything I can do to optimize this count distinct?
if exists (select 1 from INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA = 'dbo' and TABLE_NAME = 'MappingUniqueValuesTbl')begin drop table dbo.MappingUniqueValuesTbl end
create table dbo.MappingUniqueValuesTbl (UniqueVal varchar(max),tblname varchar(max),colname varchar(max))
set nocount on
SET ANSI_WARNINGS OFF
declare UniqVal cursor
for
select
'declare @tbl table (id numeric(12) IDENTITY(1,1),fld varchar(max))'
+' insert into @tbl'
+' select fld'
+' from(select fld = ['+COLUMN_NAME+']'
+' from '+TABLE_SCHEMA+'.'+TABLE_NAME
+' group by ['+COLUMN_NAME+']) T'
+' where fld is not null'
+' declare @cnt numeric(10) = isnull((select MAX(id) from @tbl),0)'
+' declare @top3 varchar(max) = '''''
+' set @top3 = @top3+isnull(''"''+(select left(fld,25)+case when len(fld)>25 then ''...'' else '''' end from @tbl where id = 1)+''"'','''')'
+' set @top3 = @top3+isnull('',''+''"''+(select left(fld,25)+case when len(fld)>25 then ''...'' else '''' end from @tbl where id = 2)+''"'','''')'
+' set @top3 = @top3+isnull('',''+''"''+(select left(fld,25)+case when len(fld)>25 then ''...'' else '''' end from @tbl where id = 3)+''"'','''')'
+' insert into dbo.MappingUniqueValuesTbl '
+' select UniqueVal = ('
+' select case when @cnt = 0 then ''All Nulls'''
+' when @cnt = 1 then ''Unique Value: '' + @top3'
+' when @cnt > 1 then ''Multiple Values: '+CHAR(39)
+' +@top3+case when @cnt > 3 then '', and ''+cast(@cnt-3 as varchar(10))+'' more.'' else '''' end'
+' end) ,'''+TABLE_NAME+''' ,'''+replace(COLUMN_NAME,CHAR(39),CHAR(39)+'+CHAR(39)+'+CHAR(39))+''''
from INFORMATION_SCHEMA.COLUMNS
where TABLE_SCHEMA = 'CONV'
and TABLE_NAME not like '%BAK%'
open UniqVal
declare @sqlexec varchar(max)
fetch next
from UniqVal into @sqlexec
while (@@FETCH_STATUS = 0)
begin
--select @sqlexec
exec (@sqlexec)
print cast(getdate() as time)
fetch next from UniqVal into @sqlexec
end
close UniqVal
deallocate UniqVal
set nocount off
SET ANSI_WARNINGS ON
-- Select Results
select * from dbo.MappingUniqueValuesTbl
go
What this results in is a single instance of the following for every column.
declare @tbl table (id numeric(12) IDENTITY(1,1),fld varchar(max))
insert into @tbl
select fld
from(select fld = [Tst_Field]
from DBO.TestTbl
group by [Tst_Field]) T
where fld is not null
declare @cnt numeric(10) = isnull((select MAX(id) from @tbl),0)
declare @top3 varchar(max) = ''
set @top3 = @top3+isnull('"'+(select left(fld,25)+case when len(fld)>25 then '...' else '' end from @tbl where id = 1)+'"','')
set @top3 = @top3+isnull(','+'"'+(select left(fld,25)+case when len(fld)>25 then '...' else '' end from @tbl where id = 2)+'"','')
set @top3 = @top3+isnull(','+'"'+(select left(fld,25)+case when len(fld)>25 then '...' else '' end from @tbl where id = 3)+'"','')
insert into dbo.MappingUniqueValuesTbl
select UniqueVal = (
select case when @cnt = 0 then 'All Nulls'
when @cnt = 1 then 'Unique Value: ' + @top3
when @cnt > 1 then 'Multiple Values: ' +@top3+case when @cnt > 3 then ', and '+cast(@cnt-3 as varchar(10))+' more.' else '' end
end)
,'TestTbl'
,'Tst_Field'
Hopefully this is clear enough, any tips or ideas would be phenomenal
Last note: I did find someone who was doing something similar but with ALL the distinct data from all their tables. I couldn't think how to modify it to fit what I need though.
Here is the actual execution plan on one of my columns per Juan's request. https://drive.google.com/file/d/0B9n-RmuEDGf6eEVzU2xOQkEtaXM/view?usp=sharing
Also, here's the image:
Edit: One more image, to show what happens on non-indexed, which I think is my problem.Non-Indexed Actual Plan