8

Is there a way how to get a list of tables used within an SQL query? Example : I have something like :

SELECT * FROM Table t JOIN OtherTable ON t.id=OtherTable.t_id

and I would expect to get

Table, OtherTable

Thanks

Biggles
  • 1,306
  • 1
  • 12
  • 22
  • 3
    Why do you want this? – Dhwani May 22 '13 at 12:56
  • 2
    Any particular RDBMS/SQL Dialect? If TSQL you could probably look at `TSql100Parser` [example of using it to identify parameters here](http://stackoverflow.com/a/5793088/73226) – Martin Smith May 22 '13 at 12:56
  • Barring a somewhat complex RegEx, I can't think of anything that would do that. I'm curious to see if there is some API or query for that now. – Geeky Guy May 22 '13 at 12:56
  • 1
    I second ITBeginner's question...Why do you want to do this? – hanut May 22 '13 at 12:58
  • 1
    Well, the logic is you have to find all the words appearing after keywords such as FROM JOIN etc etc. and list them. – Vivek Sadh May 22 '13 at 12:59
  • Writing some parser is obviously an option, but maybe there is a way how to resolve this in better way. Well I'm trying to build some GUI for users to filter the content of a report. The report is generated based on SQL query and we want to inject some filter into the WHERE clause. – Biggles May 22 '13 at 13:02
  • do you have the SQL query as a string variable in c# or do you need to do this through SQL only? – bendataclear May 23 '13 at 08:05
  • By default I'm in C#, but I can call SQL if I need to – Biggles May 23 '13 at 08:07

5 Answers5

11

One solution using C# is to import Microsoft.SqlServer.TransactSql.ScriptDom (I found the dll at C:\Program Files (x86)\Microsoft SQL Server\120\SDK\Assemblies\Microsoft.SqlServer.TransactSql.ScriptDom.dll) then do the following:

private List<string> GetTableNamesFromQueryString(string query)
{
    IList<ParseError> errors = new List<ParseError>();
    IList<TSqlParserToken> queryTokens;
    List<string> output = new List<string>(16);
    StringBuilder sb = new StringBuilder(128);
    TSql120Parser parser = new TSql120Parser(true);
    TSqlTokenType[] fromTokenTypes = new TSqlTokenType[2]
        {
            TSqlTokenType.From,
            TSqlTokenType.Join
        };
    TSqlTokenType[] identifierTokenTypes = new TSqlTokenType[2]
        {
            TSqlTokenType.Identifier,
            TSqlTokenType.QuotedIdentifier
        };

    using (System.IO.TextReader tReader = new System.IO.StringReader(query))
    {
        queryTokens = parser.GetTokenStream(tReader, out errors);
        if (errors.Count > 0) { return errors.Select(e=>"Error: " + e.Number + " Line: " + e.Line + " Column: " + e.Column + " Offset: " + e.Offset + " Message: " + e.Message).ToList(); }

        for (int i = 0; i < queryTokens.Count; i++)
        {
            if(fromTokenTypes.Contains(queryTokens[i].TokenType))
            {
                for (int j = i + 1; j < queryTokens.Count; j++)
                {
                    if (queryTokens[j].TokenType == TSqlTokenType.WhiteSpace) { continue; }
                    else if (identifierTokenTypes.Contains(queryTokens[j].TokenType))
                    {
                        sb.Clear();

                        GetQuotedIdentifier(queryTokens[j], sb);            //Change Identifiers to QuotedIdentifier (text only)

                        while (j + 2 < queryTokens.Count && queryTokens[j + 1].TokenType == TSqlTokenType.Dot && identifierTokenTypes.Contains(queryTokens[j + 2].TokenType))
                        {
                            sb.Append(queryTokens[j + 1].Text);
                            GetQuotedIdentifier(queryTokens[j + 2], sb);    //Change Identifiers to QuotedIdentifier (text only)

                            j += 2;
                        }

                        output.Add(sb.ToString());
                        break;              //exit the loop
                    }
                    else { break; }             //exit the loop if token is not a FROM, a JOIN, or white space.
                }

            }
        }

        return output.Distinct().OrderBy(tableName => tableName).ToList();
    }
}

private void GetQuotedIdentifier(TSqlParserToken token, StringBuilder sb)
{
    switch(token.TokenType)
    {
        case TSqlTokenType.Identifier: sb.Append('[').Append(token.Text).Append(']'); return;
        case TSqlTokenType.QuotedIdentifier: sb.Append(token.Text); return;
        default: throw new ArgumentException("Error: expected TokenType of token should be TSqlTokenType.Identifier or TSqlTokenType.QuotedIdentifier");
    }
}

I came up with this after trying to get this answer to work.

Community
  • 1
  • 1
Trisped
  • 5,705
  • 2
  • 45
  • 58
8

you can use this sql script right after your query. It will return a list of tables used in the last executed query:

   SELECT Field1, Field2 
   FROM Table t JOIN OtherTable ON t.id=OtherTable.t_id

  ;WITH vwQueryStats AS(
     SELECT 
      COALESCE(OBJECT_NAME(s2.objectid),'Ad-Hoc') AS ProcName
      ,execution_count
      ,s2.objectid
      ,(
         SELECT TOP 1 
            SUBSTRING(s2.TEXT,statement_start_offset / 2+1 
            ,( ( CASE WHEN statement_end_offset = -1
                THEN (LEN(CONVERT(NVARCHAR(MAX),s2.TEXT)) * 2)
                ELSE statement_end_offset END)- statement_start_offset) / 2+1)) AS sql_statement
            ,last_execution_time
         FROM sys.dm_exec_query_stats AS s1
         CROSS APPLY sys.dm_exec_sql_text(sql_handle) AS s2
    )
    SELECT TOP 1 * 
    INTO #lastQueryStats
    FROM vwQueryStats x
    WHERE sql_statement NOT like 'WITH vwQueryStats AS%'
    ORDER BY last_execution_time DESC

    SELECT
    TABLE_NAME
    FROM #lastQueryStats, INFORMATION_SCHEMA.TABLES tab 
    WHERE CHARINDEX( tab.TABLE_NAME, sql_statement) > 0


    DROP TABLE #lastQueryStats 

I've taken the query that retrieves the last executed query from this post and I modified it a bit to match with your example.

The output will be as you requested:

 Table
 OtherTable

Then if you want to have them comma separated you can do:

DECLARE @tableNames VARCHAR(MAX) 

SELECT @tableNames = COALESCE(@tableNames + ', ', '') + TABLE_NAME
FROM   #lastQueryStats, INFORMATION_SCHEMA.TABLES tab 
WHERE  CHARINDEX( tab.TABLE_NAME, sql_statement) > 0

SELECT @tableNames 

However you should be wary that in a 'usual' production or QA environment with thousands of query executed concurrently this mightn't work as another query could be executed in between your first query and the query that extracts info from db stats.

Hope it helps

Community
  • 1
  • 1
codingadventures
  • 2,924
  • 2
  • 19
  • 36
  • This seems very interesting, however it doesn't work. Well, even worse, it works only sometimes. I mean the important park, obviously comma separation is not as issue. I'm sure I'm the only one making queries to the database (it is a local one), but most of the time I receive only empty response. – Biggles May 22 '13 at 14:15
  • are you executing them together in the same session? – codingadventures May 22 '13 at 14:22
  • Yes I am. I have a database with ~hundrets of records and it works always when I call SELECT TOP 5 * FROM Result r JOIN Sample s ON r.sample_id=s.sample_id and practically never, when I call SELECT * FROM Result r JOIN Sample s ON r.sample_id=s.sample_id – Biggles May 23 '13 at 06:24
  • @user436730 you'r right, sorry I fixed it. issue was in this condition sql_statement NOT like 'SELECT TOP 1 * FROM(SELECT %' I changed into WITH vwQueryStats AS% I modified the original snippet forgetting to change the condition on the last executed query. I tried it and works now – codingadventures May 23 '13 at 08:01
4

Code below is based on Trisped's answer, but modified to work with fully qualified table names that omit a schema name, and a few cleanups/optimizations:

public class Parser
{
    public static List<string> GetTableNamesFromQueryString(string query)
    {
        var output = new List<string>();
        var sb = new StringBuilder();
        var parser = new TSql120Parser(true);

        var fromTokenTypes = new[]
        {
            TSqlTokenType.From,
            TSqlTokenType.Join
        };

        var identifierTokenTypes = new[]
        {
            TSqlTokenType.Identifier,
            TSqlTokenType.QuotedIdentifier
        };

        using (System.IO.TextReader tReader = new System.IO.StringReader(query))
        {
            IList<ParseError> errors;
            var queryTokens = parser.GetTokenStream(tReader, out errors);
            if (errors.Any())
            {
                return errors
                    .Select(e => string.Format("Error: {0}; Line: {1}; Column: {2}; Offset: {3};  Message: {4};", e.Number, e.Line, e.Column, e.Offset, e.Message))
                    .ToList();
            }

            for (var i = 0; i < queryTokens.Count; i++)
            {
                if (fromTokenTypes.Contains(queryTokens[i].TokenType))
                {
                    for (var j = i + 1; j < queryTokens.Count; j++)
                    {
                        if (queryTokens[j].TokenType == TSqlTokenType.WhiteSpace)
                        {
                            continue;
                        }

                        if (identifierTokenTypes.Contains(queryTokens[j].TokenType))
                        {
                            sb.Clear();
                            GetQuotedIdentifier(queryTokens[j], sb);

                            while (j + 2 < queryTokens.Count 
                                && queryTokens[j + 1].TokenType == TSqlTokenType.Dot 
                                && (queryTokens[j + 2].TokenType == TSqlTokenType.Dot || identifierTokenTypes.Contains(queryTokens[j + 2].TokenType)))
                            {
                                sb.Append(queryTokens[j + 1].Text);

                                if (queryTokens[j + 2].TokenType == TSqlTokenType.Dot)
                                {
                                    if (queryTokens[j - 1].TokenType == TSqlTokenType.Dot) 
                                        GetQuotedIdentifier(queryTokens[j + 1], sb);

                                    j++;

                                }
                                else
                                {
                                    GetQuotedIdentifier(queryTokens[j + 2], sb);
                                    j += 2;
                                }
                            }

                            output.Add(sb.ToString());
                        }
                        break;
                    }
                }
            }

            return output.Distinct().OrderBy(tableName => tableName).ToList();
        }
    }

    private static void GetQuotedIdentifier(TSqlParserToken token, StringBuilder sb)
    {
        switch (token.TokenType)
        {
            case TSqlTokenType.Identifier: 
                sb.Append('[').Append(token.Text).Append(']'); 
                break;
            case TSqlTokenType.QuotedIdentifier:
            case TSqlTokenType.Dot: 
                sb.Append(token.Text); 
                break;

            default: throw new ArgumentException("Error: expected TokenType of token should be TSqlTokenType.Dot, TSqlTokenType.Identifier, or TSqlTokenType.QuotedIdentifier");
        }
    }
}
Community
  • 1
  • 1
Thiago Silva
  • 14,183
  • 3
  • 36
  • 46
  • What situation is this code block made to handle? if (queryTokens[j + 2].TokenType == TSqlTokenType.Dot) { if (queryTokens[j - 1].TokenType == TSqlTokenType.Dot) GetQuotedIdentifier(queryTokens[j + 1], sb); j++; } I'm writing unit tests for this, and can't find a way into this block. – Matthew Vines Aug 09 '17 at 20:22
  • @MatthewVines it's been 2 years, I honestly don't recall, but what comes to mind are tokens with fully qualified names with schemas, such as "select * from [mydb].[myschema].[mytable]" – Thiago Silva Aug 11 '17 at 22:47
  • Thank you for giving it some attention. After a lot of different scenarios, I was able to strip that set of expressions to if (queryTokens[j + 2].TokenType == TSqlTokenType.Dot) { j++; } without any loss of functionality. – Matthew Vines Aug 14 '17 at 19:03
1

One hackish way you could accomplish this would be to explicitly name the fields in your query and prefix them with the table name, e.g.

SELECT Field1 As "OtherTable.Field1",
       Field2 As "Table.Field2"
FROM Table t JOIN OtherTable ON t.id=OtherTable.t_id

Essentially, you're providing your own metadata in the query results. After your query returns, look at the column names and implement custom logic to split out the table names.

George Johnston
  • 31,652
  • 27
  • 127
  • 172
1

Trisped's solution works perfectly. I modified one line to ensure case insensitive and trim off the brackets.

OLD: output.Add(sb.ToString());

NEW: output.Add(sb.ToString().ToLower().Trim(new char[]{'[', ']'}));

Shrey Gupta
  • 5,509
  • 8
  • 45
  • 71
eeldivad
  • 43
  • 6