0

So here is my problem. I am creating a "mapping" document to make data analysis for conversion from an unfamiliar database easier. I copy this into an Excel document for ease of use. I've got the standard INFORMATION_SCHEMA stuff like column name, data_type, etc... but I need a sampling of data to make mapping decisions easier.

Items the code needs:

A sample of what is in the column for every column in X schema (CONV in my case).

I solved this in the code: If it has all nulls = "All Nulls", if it has a single unique value = "Unique Value: [value here]", and if it has many values "Multiple Values: [comma separated quoted list goes here] and [count distinct - 3 if there are more than 3 distinct items] more."

The problem:

The very nature of this is that it runs slowly. Is there anything I can do to optimize this count distinct?

if exists (select 1 from INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA = 'dbo' and TABLE_NAME = 'MappingUniqueValuesTbl')begin drop table dbo.MappingUniqueValuesTbl end
create table dbo.MappingUniqueValuesTbl (UniqueVal varchar(max),tblname varchar(max),colname varchar(max))

set nocount on

SET ANSI_WARNINGS OFF

declare UniqVal cursor 
for

select 
 'declare @tbl table (id numeric(12) IDENTITY(1,1),fld varchar(max))'
+' insert into @tbl'
+' select fld'
+'   from(select fld = ['+COLUMN_NAME+']'
+'  from '+TABLE_SCHEMA+'.'+TABLE_NAME
+' group by ['+COLUMN_NAME+']) T'
+' where fld is not null'
+' declare @cnt numeric(10) = isnull((select MAX(id) from @tbl),0)'
+' declare @top3 varchar(max) = '''''
+' set @top3 = @top3+isnull(''"''+(select left(fld,25)+case when len(fld)>25 then ''...'' else '''' end from @tbl where id = 1)+''"'','''')'
+' set @top3 = @top3+isnull('',''+''"''+(select left(fld,25)+case when len(fld)>25 then ''...'' else '''' end from @tbl where id = 2)+''"'','''')'
+' set @top3 = @top3+isnull('',''+''"''+(select left(fld,25)+case when len(fld)>25 then ''...'' else '''' end from @tbl where id = 3)+''"'','''')'
+' insert into dbo.MappingUniqueValuesTbl '
+' select UniqueVal = ('
+' select case when @cnt = 0 then ''All Nulls'''
+'  when @cnt = 1 then ''Unique Value: '' + @top3'
+'    when @cnt > 1 then ''Multiple Values: '+CHAR(39)
+' +@top3+case when @cnt > 3 then '', and ''+cast(@cnt-3 as varchar(10))+'' more.'' else '''' end'
+'   end) ,'''+TABLE_NAME+''' ,'''+replace(COLUMN_NAME,CHAR(39),CHAR(39)+'+CHAR(39)+'+CHAR(39))+''''
  from INFORMATION_SCHEMA.COLUMNS
 where TABLE_SCHEMA = 'CONV'
   and TABLE_NAME not like '%BAK%'

open UniqVal
declare @sqlexec varchar(max)
  fetch next 
   from UniqVal into @sqlexec
  while (@@FETCH_STATUS = 0)
        begin 
        --select @sqlexec
        exec (@sqlexec)
        print cast(getdate() as time)
        fetch next from UniqVal into @sqlexec
        end
  close UniqVal
deallocate UniqVal
set nocount off
SET ANSI_WARNINGS ON

-- Select Results
select * from dbo.MappingUniqueValuesTbl
go

What this results in is a single instance of the following for every column.

declare @tbl table (id numeric(12) IDENTITY(1,1),fld varchar(max)) 
insert into @tbl
select fld   
  from(select fld = [Tst_Field]  
         from DBO.TestTbl
        group by [Tst_Field]) T 
 where fld is not null 
declare @cnt numeric(10) = isnull((select MAX(id) from @tbl),0) 
declare @top3 varchar(max) = '' 
set @top3 = @top3+isnull('"'+(select left(fld,25)+case when len(fld)>25 then '...' else '' end from @tbl where id = 1)+'"','') 
set @top3 = @top3+isnull(','+'"'+(select left(fld,25)+case when len(fld)>25 then '...' else '' end from @tbl where id = 2)+'"','') 
set @top3 = @top3+isnull(','+'"'+(select left(fld,25)+case when len(fld)>25 then '...' else '' end from @tbl where id = 3)+'"','') 
insert into dbo.MappingUniqueValuesTbl  
select UniqueVal = (
        select case when @cnt = 0 then 'All Nulls'  
                    when @cnt = 1 then 'Unique Value: ' + @top3    
                    when @cnt > 1 then 'Multiple Values: ' +@top3+case when @cnt > 3 then ', and '+cast(@cnt-3 as varchar(10))+' more.' else '' end   
                     end) 
      ,'TestTbl' 
      ,'Tst_Field'

Hopefully this is clear enough, any tips or ideas would be phenomenal

Last note: I did find someone who was doing something similar but with ALL the distinct data from all their tables. I couldn't think how to modify it to fit what I need though.

Here is the actual execution plan on one of my columns per Juan's request. https://drive.google.com/file/d/0B9n-RmuEDGf6eEVzU2xOQkEtaXM/view?usp=sharing

Also, here's the image:

Edit: One more image, to show what happens on non-indexed, which I think is my problem.Non-Indexed Actual Plan

Community
  • 1
  • 1
CMoltedo
  • 3
  • 1
  • 5
  • So you can create the dynamic query but this run slow? running the created query direct on the sql is faster? show us the Execution plan of the real query (not the dynamic) http://stackoverflow.com/questions/7359702/how-do-i-obtain-a-query-execution-plan – Juan Carlos Oropeza Nov 05 '15 at 16:51
  • Shared .sqlplan file on Google drive, link at bottom of post. – CMoltedo Nov 05 '15 at 17:22
  • You didnt answer my other questions :( And dont you have one with a picture or just simple text? Very hard to read that xml – Juan Carlos Oropeza Nov 05 '15 at 17:24
  • Aw, sorry. Edit: whoops, hit enter too early. 1. No, the dynamic query generates in less than a second. 2. I'm not sure if it runs quicker if I run them individually as it is written to re-declare all variables every time it's run. I could go through and use text editor to add GO's if it's needed. This whole thing is intended to be able to run on any data, you can give it a go on your own (limit by an IN() for just a few tables if you are impatient). – CMoltedo Nov 05 '15 at 17:28

1 Answers1

0

checking your plan looks like first insert has the most cost.

that can be rewrite as

insert into @tbl
   select distinct [Tst_Field] as fld
   from DBO.TestTbl
   where [Tst_Field] is not null 

Try it and let me know

Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118