It is a solution based on query plan. It has some adventages
- almost any select queries can be processed
- no SchemaBinding
and disadventages
- has not been tested properly
- can become broken suddenly if Microsoft change XML query plan.
The core idea is that every column expression inside XML query plan is defined in "DefinedValue" node. First subnode of "DefinedValue" is a reference to output column and second one is a expression. The expression computes from input columns and constant values.
As mentioned above It's based only on empirical observation and needs to be tested properly.
It's a invocation example:
exec dbo.GetColumnDependencies 'select * from dbo.vTEST'
target_column_name | source_column_name | const_value
---------------------------------------------------
address | Expr1007 | NULL
name | Expr1006 | NULL
Expr1006 | NULL | ' '
Expr1006 | [testdb].[dbo].first_name | NULL
Expr1006 | [testdb].[dbo].last_name | NULL
Expr1007 | NULL | ' '
Expr1007 | [testdb].[dbo].number | NULL
Expr1007 | [testdb].[dbo].street | NULL
It's code.
First of all get XML query plan.
declare @select_query as varchar(4000) = 'select * from dbo.vTEST' -- IT'S YOUR QUERY HERE.
declare @select_into_query as varchar(4000) = 'select top (1) * into #foo from (' + @select_query + ') as src'
, @xml_plan as xml = null
, @xml_generation_tries as tinyint = 10
;
while (@xml_plan is null and @xml_generation_tries > 0) -- There is no guaranty that plan will be cached.
begin
execute (@select_into_query);
select @xml_plan = pln.query_plan
from sys.dm_exec_query_stats as qry
cross apply sys.dm_exec_sql_text(qry.sql_handle) as txt
cross apply sys.dm_exec_query_plan(qry.plan_handle) as pln
where txt.text = @select_into_query
;
end
if (@xml_plan is null
) begin
raiserror(N'Can''t extract XML query plan from cache.' ,15 ,0);
return;
end
;
Next is a main query. It's biggest part is recursive common table expression for column extraction.
with xmlnamespaces(default 'http://schemas.microsoft.com/sqlserver/2004/07/showplan'
,'http://schemas.microsoft.com/sqlserver/2004/07/showplan' as shp -- Used in .query() for predictive namespace using.
)
, cte_column_dependencies as
(
The seed of recursion is a query that extracts columns for #foo table that store 1 row of interested select query.
select
(select foo_col.info.query('./ColumnReference') for xml raw('shp:root') ,type) -- Becouse .value() can't extract attribute from root node.
as target_column_info
, (select foo_col.info.query('./ScalarOperator/Identifier/ColumnReference') for xml raw('shp:root') ,type)
as source_column_info
, cast(null as xml) as const_info
, 1 as iteration_no
from @xml_plan.nodes('//Update/SetPredicate/ScalarOperator/ScalarExpressionList/ScalarOperator/MultipleAssign/Assign')
as foo_col(info)
where foo_col.info.exist('./ColumnReference[@Table="[#foo]"]') = 1
The recursive part searches for "DefinedValue" node with depended column and extract all "ColumnReference" and "Const" subnodes that used in column expression. It's over complicated by XML to SQL conversions.
union all
select
(select internal_col.info.query('.') for xml raw('shp:root') ,type)
, source_info.column_info
, source_info.const_info
, prev_dependencies.iteration_no + 1
from @xml_plan.nodes('//DefinedValue/ColumnReference') as internal_col(info)
inner join cte_column_dependencies as prev_dependencies -- Filters by depended columns.
on prev_dependencies.source_column_info.value('(//ColumnReference/@Column)[1]' ,'nvarchar(4000)') = internal_col.info.value('(./@Column)[1]' ,'nvarchar(4000)')
and exists (select prev_dependencies.source_column_info.value('(.//@Schema)[1]' ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Schema)[1]' ,'nvarchar(4000)'))
and exists (select prev_dependencies.source_column_info.value('(.//@Database)[1]' ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Database)[1]' ,'nvarchar(4000)'))
and exists (select prev_dependencies.source_column_info.value('(.//@Server)[1]' ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Server)[1]' ,'nvarchar(4000)'))
cross apply ( -- Becouse only column or only constant can be places in result row.
select (select source_col.info.query('.') for xml raw('shp:root') ,type) as column_info
, null as const_info
from internal_col.info.nodes('..//ColumnReference') as source_col(info)
union all
select null as column_info
, (select const.info.query('.') for xml raw('shp:root') ,type) as const_info
from internal_col.info.nodes('..//Const') as const(info)
) as source_info
where source_info.column_info is null
or (
-- Except same node selected by '..//ColumnReference' from its sources. Sorry, I'm not so well to check it with XQuery simple.
source_info.column_info.value('(//@Column)[1]' ,'nvarchar(4000)') <> internal_col.info.value('(./@Column)[1]' ,'nvarchar(4000)')
and (select source_info.column_info.value('(//@Schema)[1]' ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Schema)[1]' ,'nvarchar(4000)')) is null
and (select source_info.column_info.value('(//@Database)[1]' ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Database)[1]' ,'nvarchar(4000)')) is null
and (select source_info.column_info.value('(//@Server)[1]' ,'nvarchar(4000)') intersect select internal_col.info.value('(./@Server)[1]' ,'nvarchar(4000)')) is null
)
)
Finally, It's select statement that convert XML to appropriate human text.
select
-- col_dep.target_column_info
--, col_dep.source_column_info
--, col_dep.const_info
coalesce(col_dep.target_column_info.value('(.//shp:ColumnReference/@Server)[1]' ,'nvarchar(4000)') + '.' ,'')
+ coalesce(col_dep.target_column_info.value('(.//shp:ColumnReference/@Database)[1]' ,'nvarchar(4000)') + '.' ,'')
+ coalesce(col_dep.target_column_info.value('(.//shp:ColumnReference/@Schema)[1]' ,'nvarchar(4000)') + '.' ,'')
+ col_dep.target_column_info.value('(.//shp:ColumnReference/@Column)[1]' ,'nvarchar(4000)')
as target_column_name
, coalesce(col_dep.source_column_info.value('(.//shp:ColumnReference/@Server)[1]' ,'nvarchar(4000)') + '.' ,'')
+ coalesce(col_dep.source_column_info.value('(.//shp:ColumnReference/@Database)[1]' ,'nvarchar(4000)') + '.' ,'')
+ coalesce(col_dep.source_column_info.value('(.//shp:ColumnReference/@Schema)[1]' ,'nvarchar(4000)') + '.' ,'')
+ col_dep.source_column_info.value('(.//shp:ColumnReference/@Column)[1]' ,'nvarchar(4000)')
as source_column_name
, col_dep.const_info.value('(/shp:root/shp:Const/@ConstValue)[1]' ,'nvarchar(4000)')
as const_value
from cte_column_dependencies as col_dep
order by col_dep.iteration_no ,target_column_name ,source_column_name
option (maxrecursion 512) -- It's an assurance from infinite loop.