Parsing out complete dynamic SQL expressions from hundreds of stored procedures

Question

I've inherited a large application that has many hundreds of stored procedures, many of which use dynamic SQL. In order to get a better handle of the types of SQL I am dealing with, it would be extremely useful if I had a way to parse the query text of all these stored procedures and extract the complete expression for any dynamic SQL contained within.

A simplified expression might be:

declare @query nvarchar(max)
set @query = 'SELECT col1,col2,col3 from ' + @DatabaseName + '.dbo.' + @TableName + ' WHERE {some criteria expression that also contains inline quotes}'

The output I am seeking for the above (that would ultimately be called in a single query that parses all stored procedures) is:

SELECT col1, col2, col3 
FROM ' + @DatabaseName + '.dbo.' + @TableName + ' 
WHERE {some criteria expression that also contains inline quotes}

So, not the expression after the parameter values have been passed in, but the expression text as in the stored procedure text, including the parameter names.

I'm ok with the not-at-all-safe assumption that the dynamic SQL parameter name is @query, so searching for this within the SQL expression to use as a starting position to extract text would be tolerable, but since there are single quotes inline, I have no easy way of knowing where the assignment to the variable is complete.

I'm including the [antlr] and [parsing] tags in this question because I have a feeling this is beyond what's capable in T-SQL.

PS: Yes, I'm well aware "I shouldn't be doing this".

EDIT

From a suggestion below, tried the following query but not really useful in this context:

SELECT 
db_name(dbid) DB_NAME
,cacheobjtype, objtype, object_name(objectid) ObjectName
,objectid 
,x.text
,usecounts 
--  , x.*,z.* ,db_name(dbid)
FROM 
sys.dm_exec_cached_plans z
CROSS APPLY sys.dm_exec_sql_text(plan_handle)  x
WHERE 
    --usecounts > 1 
    --objType = 'Proc' and  -- include if you only want to see stored procedures 
    db_name(dbid) not like 'ReportServer%' and db_name(dbid) <> 'msdb' and db_name(dbid) not like 'DBADB%' and db_name(dbid) <> 'master'
--ORDER BY usecounts DESC
ORDER BY objtype

While this is beyond the capabilities of T-SQL (unless you get very clunky and inaccurate with string matching), it's not beyond the capabilities of [`Microsoft.SqlServer.TransactSql.ScriptDom`](https://msdn.microsoft.com/library/microsoft.sqlserver.transactsql.scriptdom), which is the parser used internally by the [SQL Server Data Tools](https://msdn.microsoft.com/library/hh272686). This may be a more useful starting point than ANTLR. — Jeroen Mostert, Jun 16 '16 at 15:48
On a related note, [sys.dm_exec_cached_plans](https://msdn.microsoft.com/library/ms187404) can be a useful tool for finding out which of the many possible queries that can be generated are actually being generated. While parameterization will not span tables or databases, it's still more useful than what raw profiler traces would produce. — Jeroen Mostert, Jun 16 '16 at 16:26
dm_exec_cached_plans seemed possibly useful, but for my problem, I don't think sorting what's relevant from what isn't is possible. — tbone, Jun 16 '16 at 17:32

score 5 · Accepted Answer · answered Jun 16 '16 at 19:51

To a first approximation, here's how you'd do it in C# using ScriptDom.

Getting a list of all stored procedure definitions is easy. That can be done in T-SQL, even:

sp_msforeachdb 'select definition from [?].sys.sql_modules'

Or script databases the usual way, or use SMO. In any case, I'm assuming you can get these into a List<string> somehow, for consumption by code.

Microsoft.SqlServer.TransactSql.ScriptDom is available as a NuGet package, so add that to a brand new application. The core of our problem is writing a visitor that will pluck the nodes we're interested in from a T-SQL script:

class DynamicQueryFinder : TSqlFragmentVisitor {
  public List<ScalarExpression> QueryAssignments { get; } = new List<ScalarExpression>();
  public string ProcedureName { get; private set; }

  // Grab "CREATE PROCEDURE ..." nodes
  public override void Visit(CreateProcedureStatement node) {
    ProcedureName = node.ProcedureReference.Name.BaseIdentifier.Value;
  }

  // Grab "SELECT @Query = ..." nodes
  public override void Visit(SelectSetVariable node) {
    if ("@Query".Equals(node.Variable.Name, StringComparison.OrdinalIgnoreCase)) {
      QueryAssignments.Add(node.Expression);
    }
  }

  // Grab "SET @Query = ..." nodes
  public override void Visit(SetVariableStatement node) {
    if ("@Query".Equals(node.Variable.Name, StringComparison.OrdinalIgnoreCase)) {
      QueryAssignments.Add(node.Expression);
    }
  }

  // Grab "DECLARE @Query = ..." nodes
  public override void Visit(DeclareVariableElement node) {
    if (
      "@Query".Equals(node.VariableName.Value, StringComparison.OrdinalIgnoreCase) && 
      node.Value != null
    ) {
      QueryAssignments.Add(node.Value);
    }
  }
}

Let's say procedures is a List<string> that has the stored procedure definitions, then we apply the visitor like so:

foreach (string procedure in procedures) {
  TSqlFragment fragment;
  using (var reader = new StringReader(procedure)) {
    IList<ParseError> parseErrors;
    var parser = new TSql130Parser(true);  // or a lower version, I suppose
    fragment = parser.Parse(reader, out parseErrors);
    if (parseErrors.Any()) {
      // handle errors
      continue;
    }
  }
  var dynamicQueryFinder = new DynamicQueryFinder();
  fragment.Accept(dynamicQueryFinder);
  if (dynamicQueryFinder.QueryAssignments.Any()) {
    Console.WriteLine($"===== {dynamicQueryFinder.ProcedureName} =====");
    foreach (ScalarExpression assignment in dynamicQueryFinder.QueryAssignments) {
      Console.WriteLine(assignment.Script());
    }
  }
}

.Script() is a little convenience method I cobbled up so we can turn fragments back into plain text:

public static class TSqlFragmentExtensions {
  public static string Script(this TSqlFragment fragment) {
    return String.Join("", fragment.ScriptTokenStream
      .Skip(fragment.FirstTokenIndex)
      .Take(fragment.LastTokenIndex - fragment.FirstTokenIndex + 1)
      .Select(t => t.Text)
    );
  }
}

This will print all expressions in all stored procedures that are assigned to a variable named @Query.

The nice thing about this approach is that you will have the statements parsed at your fingertips, so more complicated processing, like turning the string expressions back into their unescaped forms or hunting for all instances of EXEC(...) and sp_executesql (regardless of variable names involved), is also possible.

The drawback, of course, is that this isn't pure T-SQL. You can use any .NET language you like for it (I've used C# since I'm most comfortable with that), but it still involves writing external code. More primitive solutions like just CHARINDEXing your way over strings may work, if you know that all code follows a particular pattern that is simple enough for T-SQL string operations to analyze.

It will take me a while to confirm this works but since I can't imagine any TSQL approach working, and it certainly sounds like you know what you're talking about, I'm going to mark it as correct. Thanks very much by the way, if this works it will be amazingly useful. — tbone, Jun 16 '16 at 20:58
@tbone: It's actual code that I compiled and ran against an actual database, for what it's worth (although I used `@sql` as the variable name, since that's the one in vogue on our systems). Of course, if you find an error or a way to improve it I'll be all ears! — Jeroen Mostert, Jun 16 '16 at 21:01
Ok good to know you ran it successfully, thanks for updating with that. After seeing this I'm now wondering what other long outstanding hard problems I have that you could answer without breaking a sweat! :) — tbone, Jun 16 '16 at 21:09

Parsing out complete dynamic SQL expressions from hundreds of stored procedures

1 Answers1

Linked