0

Would like to create analysis for queries in SQL Server which will be checking which tables were join on which tables on which columns. Would like to use ScriptDom in C#. For example we have query:

SELECT *
FROM tabel1
INNER JOIN tabel2 ON tabel1.varchar1col = tabel2.varchar1col
INNER JOIN tabel3 ON tabel1.varchar2col = tabel3.varchar2col
LEFT JOIN tabel4 ON tabel3.varchar2col = tabel4.varchar3col

And would like to have result:

tabel1 varchar1col
tabel1 varchar2col
tabel2 varchar1col
tabel3 varchar2col
tabel4 varchar3col

Is it possible to create it in ScriptDom?

Dale K
  • 25,246
  • 15
  • 42
  • 71
adam.g
  • 101
  • 3
  • The script DOM can produce the desired result, but only because the column references in your query are qualified with the table name. A more generalized solution (e.g. unqualified column names) is not possible with the script DOM because it does not resolve column references. – Dan Guzman Apr 07 '23 at 10:22
  • @DanGuzman so If we have aliases for tables then we can not retrieve column names correct ? – adam.g Apr 07 '23 at 10:37
  • With aliases, the analysis code would need to map table aliases to table names in order to resolve the column references to the corresponding tables for the result. Consider also that a query may have valid syntax that will parse successfully but will fail at run time when aliases are undefined: `SELECT * FROM table1 AS a INNER JOIN table2 AS b ON y.varchar1col = z.varchar1col;` – Dan Guzman Apr 07 '23 at 11:13
  • Ok, do you know when can I get sample of code to retrieve desired result (knowing the problems of course) ? – adam.g Apr 07 '23 at 11:24
  • I'll post an example PowerShell script when I have a few minutes. – Dan Guzman Apr 07 '23 at 12:34
  • This gets pretty complicated quickly, since aliases / columns can come from CTEs etc. I wouldn't say it's impossible, but it's not a quick script – siggemannen Apr 07 '23 at 13:08

1 Answers1

3

As discussed in comments, the parser does not actually resolve column names referenced in JOIN/WHERE clause predicates to objects in the query. Consider a syntactically valid query can reference invalid tables/aliases, unqualified names are ambiguous, plus constructs like CTEs, and derived tables, etc. present a challenge.

The example PowerShell script below (which you can convert to your .NET language of choice) uses a script DOM visitor. This will provide the desired results for the query in your question because the column names qualified with the actual table name. For more generalized use, predicates would need to consistently specify a 2-part name that reference a table (not alias, CTE, derived table, etc.).

class ColumnReference {
    [String]$TableName
    [String]$ColumnName
}

class ColumnReferenceVisitor: Microsoft.SqlServer.TransactSql.ScriptDom.TSqlFragmentVisitor {

    [System.Collections.Generic.List[ColumnReference]]$ColumnReferences

    [void]Visit ([Microsoft.SqlServer.TransactSql.ScriptDom.BooleanComparisonExpression] $fragment) {

        # assume 2-part column references are a table and column name
        if($fragment.FirstExpression.GetType().ToString() -eq "Microsoft.SqlServer.TransactSql.ScriptDom.ColumnReferenceExpression") {
            [Microsoft.SqlServer.TransactSql.ScriptDom.ColumnReferenceExpression]$colRef = $fragment.FirstExpression
            if($colRef.MultiPartIdentifier.Count -eq 2) {
                $columnReference = [ColumnReference]::new()
                $columnReference.TableName = $colRef.MultiPartIdentifier[0].Value
                $columnReference.ColumnName = $colRef.MultiPartIdentifier[1].Value
                $this.ColumnReferences.Add($columnReference)
            }
        }

        if($fragment.SecondExpression.GetType().ToString() -eq "Microsoft.SqlServer.TransactSql.ScriptDom.ColumnReferenceExpression") {
            [Microsoft.SqlServer.TransactSql.ScriptDom.ColumnReferenceExpression]$colRef = $fragment.SecondExpression
            if($colRef.MultiPartIdentifier.Count -eq 2) {
                $columnReference = [ColumnReference]::new()
                $columnReference.TableName = $colRef.MultiPartIdentifier[0].Value
                $columnReference.ColumnName = $colRef.MultiPartIdentifier[1].Value
                $this.ColumnReferences.Add($columnReference)
            }
        }

    }

    # parse script and invoke visitor
    static [System.Collections.Generic.List[ColumnReference]] getColumnReferences($script) {

        $parser = New-Object Microsoft.SqlServer.TransactSql.ScriptDom.TSql160Parser($true)
        $parseErrors = New-Object System.Collections.Generic.List[Microsoft.SqlServer.TransactSql.ScriptDom.ParseError]
        $stringReader = New-Object System.IO.StringReader($script)

        $fragment = $parser.Parse($stringReader, [ref]$parseErrors)
        if($parseErrors.Count -gt 0) {
            throw "$($parseErrors.Count) parsing errors: $(($parseErrors | ConvertTo-Json))"
        }

        $visitor = [ColumnReferenceVisitor]::new()
        $fragment.Accept($visitor)

        return $visitor.ColumnReferences

    }

    ColumnReferenceVisitor() {

        $this.ColumnReferences = New-Object System.Collections.Generic.List[ColumnReference]

    }
}


# ############
# ### MAIN ###
# ############

$script = @"
SELECT *
FROM tabel1
INNER JOIN tabel2 ON tabel1.varchar1col = tabel2.varchar1col
INNER JOIN tabel3 ON tabel1.varchar2col = tabel3.varchar2col
LEFT JOIN tabel4 ON tabel3.varchar2col = tabel4.varchar3col
"@

try {

    $columnReferences = [ColumnReferenceVisitor]::getColumnReferences($script)
    $columnReferences | Sort-Object -Property TableName, ColumnName -Unique | Format-Table

}
catch {

    throw

}

Note that visitors in PowerShell need the script DOM assembly loaded before the derived class will compile. This can be done with a wrapper script like the example below that loads the assembly and dot-sources the script with the above visitor code. I cheated here and used the assembly included with the latest SSMS version. One should generally the assembly included in the DacFX NuGet package .

Add-Type -Path "C:\Program Files (x86)\Microsoft SQL Server Management Studio 19\Common7\IDE\Extensions\Application\Microsoft.SqlServer.TransactSql.ScriptDom.dll"

Results:

TableName ColumnName
tabel1 varchar1col
tabel1 varchar2col
tabel2 varchar1col
tabel3 varchar2col
tabel4 varchar3col
. "C:\PowerShellScripts\Get-ColumnReferences.ps1"
Dan Guzman
  • 43,250
  • 3
  • 46
  • 71