6

I have a string representing a SQL query, and I need to extract the names of the tables from that string. For example:

SELECT * FROM Customers

Would return "Customers". Or

SELECT * FROM Customers c, Addresses a WHERE c.CustomerName='foo'

SELECT a.AddressZip FROM Customers c
INNER JOIN Addresses a ON c.AddressId=a.AddressId

Would return "Customers, Addresses". Getting more advanced:

(SELECT B FROM (SELECT C FROM (SELECT Element AS C FROM MyTable)))

Would simply return "MyTable"

(Note, I may have typo'd the queries but you get the idea).

What would be the best/most accurate way of accomplishing this?

esac
  • 24,099
  • 38
  • 122
  • 179
  • The question "Why?" springs to mind.... – Mitch Wheat Jun 26 '10 at 01:51
  • Are you targeting any particular RDBMS/Dialect? There are a few questions on SO about parsing SQL e.g. http://stackoverflow.com/questions/589096/parsing-sql-code-in-c Nothing particularly conclusive that I have seen though. – Martin Smith Jun 26 '10 at 02:08
  • As for 'why' it is because I am parsing a logfile of queries and displaying the most accessed tables, operations, etc.. not that it should matter :) As for what am I targeting, it is MS-SQL. – esac Jun 26 '10 at 02:15
  • 2
    @esac: it matters. Because sometimes people post questions about specifics of a perceived solution to a problem, and not the actual problem itself. – Mitch Wheat Jun 26 '10 at 02:25
  • Is your logfile a MSSQL logfile? – Mitch Wheat Jun 26 '10 at 02:26
  • @Mitch: it is not. it is output from another tool. – esac Jun 28 '10 at 18:31
  • @esac Given the name, I'm surprised [bash](https://bash.cyberciti.biz/guide/The_case_statement) isn't one of your top tags. Cheers! – jpaugh Aug 13 '19 at 15:18
  • 1
    @jpaugh I chose this screen name when I was a wee lad about 25 years ago. At the time there was a new distro of linux just coming on the market named 'Red Hat'. I used Linux almost exclusively for 5 years and then I got a job and entered the business world of Windows computers. – esac Oct 17 '19 at 20:45

2 Answers2

5

Here's a way to do it, using a commercial utility (sqlparser.com $149, free trial)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using gudusoft.gsqlparser;

namespace GeneralSqlParserTest
{
    class Program
    {
        static void Main(string[] args)
        {
            TGSqlParser sqlparser = new TGSqlParser(TDbVendor.DbVMssql);

            sqlparser.SqlText.Text = "SELECT * FROM Customers c, Addresses a WHERE c.CustomerName='foo'";
            sqlparser.OnTableToken += new TOnTableTokenEvent(OnTableToken);

            int result = sqlparser.Parse();
            Console.ReadLine();
        }

        static void OnTableToken(object o, gudusoft.gsqlparser.TSourceToken st, gudusoft.gsqlparser.TCustomSqlStatement stmt)
        {
            Console.WriteLine("Table: {0}", st.AsText);
        }
    }
}

Note that it counts 'c' and 'a' as tables, but it would be pretty simple to filter out single character names from your results

I do not use or own this tool, just something I found after some searching...

Brian Vander Plaats
  • 2,257
  • 24
  • 28
  • It's a good start, although the fact that it picks up 'c' and 'a' are annoying as some are aliased with longer names. – esac Jun 26 '10 at 15:51
1

General SQL Parser mentioned by Brian can achieve this in a more accurate way, here is an article for this:

Get table and column in a complicated SQL script

James
  • 81
  • 1
  • 1