Getting table schema from a query

Question

As per MSDN, SqlDataReader.GetSchemaTable returns column metadata for the query executed. I am wondering is there a similar method that will give table metadata for the given query? I mean what tables are involved and what aliases it has got.

In my application, I get the query and I need to append the where clause programically. Using GetSchemaTable(), I can get the column metadata and the table it belongs to. But even though table has aliases, it still return the real table name. Is there a way to get the aliase name for that table?

Following code shows getting the column metadata.

const string connectionString = "your_connection_string";
string sql = "select c.id as s,c.firstname from contact as c";

using(SqlConnection connection = new SqlConnection(connectionString))
using(SqlCommand command = new SqlCommand(sql, connection))
{
    connection.Open();
    SqlDataReader reader = command.ExecuteReader(CommandBehavior.KeyInfo);
    DataTable schema = reader.GetSchemaTable();
    foreach (DataRow row in schema.Rows)
    {
        foreach (DataColumn column in schema.Columns)
        {
            Console.WriteLine(column.ColumnName + " = " + row[column]);
        }
        Console.WriteLine("----------------------------------------");
    }
    Console.Read();
}

This will give me details of columns correctly. But when I see BaseTableName for column Id, it is giving contact rather than the alias name c. Is there any way to get the table schema and aliases from a query like the above?

Any help would be great!

Edit

While I could use the execution plan suggested by Rob, I'd appreciate any alternative simple approaches.

Answering questions by tomekszpakowicz

Are you (or your application) source of the query in question? In that case you should know the aliases.

I am not the author of queries. We have a system where users can enter the query. We build columns out of it using the method I explained above. These details will be persisted and another user can use this like adding new criteria etc. So we need to build the SQL dynamically from the information we have. So when a column is aliased and we are not getting alias name, then the where clause constructed will be invalid.

Thanks

Why are you trying to do this? Why are you appending where clauses to this query? — Tomek Szpakowicz, Jun 22 '10 at 17:29
Alternative simple approach? Use BaseTableName, not the alias name. The real table name works fine, why do you think you need the alias name? — Dour High Arch, Jun 22 '10 at 19:57

Tomek Szpakowicz · Accepted Answer · 2010-06-25T16:36:23.780

Short answer

This won't work. You cannot, by design, get table aliases from result schema. And you cannot rely on being able to get them from query execution plan.

Long answer

When you get result of a SQL query, the query has already been parsed, validated, optimized, compiled into some internal representation and executed. Aliases are part of query's "source code" and are usually lost somewhere around step 1 and 2.

After query is executed the only things that can be seen as tables are a) real physical tables and b) returned data seen as single anonymous table. Everything between can be transformed or completely optimized out.

If DBMSes were required to retain aliases it would be practically impossible to optimize complex queries.

Possible solutions

I suggest restating a problem:

Are you (or your application) source of the query in question? In that case you should know the aliases.
If you get queries provided by someone else... Well... That depends on why are you adding where causes.
- In the worst case, you'll have to parse queries yourself.
- In the best case, you could give them access to views instead of real tables and put where clauses in the views.

Simple and ugly solution

If I understand your requirements correctly:

User A enters query into your program.
User B can run it (but cannot edit it) and sees returned data. Additionally she can add filters based on returned columns using some kind of widget provided by you.
You don't want to apply filter inside application but instead add them to the query, in order to avoid fetching unnecessary data from database.

In that case:

When A edits query try to run it and gather metadata for returned columns. If ColumnNames are not unique, complain to the author. Store metadata with query.
When B adds filter (based on query metadata), store both columns names and conditions.
At execution:
- Check if filter columns are still valid (A might have changed query). If not remove invalid filters and/or inform B.
- Execute query as something like:
```
 select *
 from ({query entered by A}) x
 where x.Column1 op1 Value1
     and x.Column2 op2 Value2
```

If you want to gracefully handle database schema changes you need to add some additional checks to make sure metadata is consistent with what query really returns.

Security note

Your program is going to pass a query written by user A straight to database. It is crucial that you do it using database connection with permissions which do not exceed A's database permissions. Otherwise you are asking for SQL injection based exploits.

Corollary

If user A doesn't have direct access to the database out of security reasons, you cannot use above solution.

In that case the only way to make it safe is to make sure your application understands 100% of the query which means parsing it in your program and allowing only operations you consider safe.

Thank you. I have edited my question to answer the points which you have asked. — Navaneeth K N, Jun 23 '10 at 04:58
Thanks again. This was how it was implemented before. But when we wrap the main query entered by user A as a sub query, it leads into performance problems. Because database engine has to execute the inner query first which will not have any criterias and may lead into full table scan. Any way thanks for all your suggestions. I appreciate it. — Navaneeth K N, Jun 24 '10 at 02:24
Have you actually seen those performance problems? Have you run both versions (wrapped and with added where) by hand and timed them? Have you compared execution plans? And, last but not least, on what RDBMS? — Tomek Szpakowicz, Jun 24 '10 at 11:42
No. That was my assumption. I will be doing tests anyway. I am using SQL server 2005. — Navaneeth K N, Jun 24 '10 at 17:38
Check this assumption first. Measure performance and compare execution plans. Also remember that SQL Server caches query plans based on exact text of the query. It's important if you execute the same query many times. And it would be nice if you wrote here what came out of the tests and how you finally decided to solve the problem. — Tomek Szpakowicz, Jun 25 '10 at 06:05
I verified it. The results was quite surprising to me. Both versions of queries (wrapped and with where) performed equally and the execution plan was identical. I am not sure that this will happen for all the queries or specific to some queries that I tried. Any way I have posted another question to discuss that. http://stackoverflow.com/questions/3131522/adding-inner-query-is-not-changing-the-execution-plan — Navaneeth K N, Jun 28 '10 at 10:48
That's what I've expected. For generic query decent RDBMS should do better job speeding up queries then your application. — Tomek Szpakowicz, Jun 28 '10 at 14:48
Actually you CAN reliably get, cache, and use aliases from execution plan XML as long as the source tables are assigned aliases up front. For example, although `select * from (select * from Lessons) a` will not return any aliases at all, if you simply add an alias after the table name, then the alias sticks to the table up the subquery chain all the way to the top-level output column. For example, `select * from Lessons a, Lessons b, Lessons c, (select * from Lessons d)` produces XML output columns including all 4 table aliases on all columns, even though they're all from the same table. — Triynko, Oct 23 '13 at 09:09

score 4 · Answer 2 · answered Jun 17 '10 at 06:01

4

You could get the execution plan for the query, and then analyse the XML that's returned. This is like using the "Show Estimated Plan" option in Management Studio.

answered Jun 17 '10 at 06:01

Rob Farley

15,625
5
44
58

Thanks. But, execution plan will not give me the table alias. – Navaneeth K N Jun 17 '10 at 06:11
Sure it does. I just ran `SELECT * FROM Production.Product p`, and got this amongst the XML: `` – Rob Farley Jun 17 '10 at 06:27
Great! How are you getting XML? Can you show some small code? – Navaneeth K N Jun 17 '10 at 06:31
Use `set showplan_xml on`, but bear in mind that doing this will stop the query running. So then you'll want to run it again without this option. – Rob Farley Jun 17 '10 at 06:57
I didn't get that for an index plan, but I did get the ColumnReference entries which as long as you select a column from a table will be enough to create alias table mappings... – Cervo Jun 24 '10 at 18:11

score 2 · Answer 3 · edited May 23 '17 at 12:25

It's almost like you need a parser to parse the SQL and then from the parsed query make a symbol table of aliases and the tables they refer to. Then combine that with the results of GetSchemaTable() so that you can map columns to the appropriate alias.

Anyway see the question Parsing SQL code in C# for some parsers. I haven't looked at them in detail but maybe one of them is what you need. If you are only doing select statements check out the ANTLR link and the grammar for http://www.antlr.org/grammar/1062280680642/MS_SQL_SELECT.html.

If your queries are simple, you could probably use regular expressions or your own custom grammar to parse out the aliases and table names from the query. This would probably be the easiest solution.

The most robust solution is probably to pay for someone else's parser that handles full SQL and breaks it up into a parse tree or something else where you can query it. I'm not sure the merits of each one and the price/robustness ratio. But some of them are ultra expensive.... I would say if you can't do it yourself explore the ANTLR grammar (because it is free) assuming you just need select statements. Otherwise you may have to pay....

Actually assuming your users aren't crazy SQL geniuses and using subqueries/etc. I don't see why you can't use the table names from the schema view that you said you got to find them in the query and then find the alias as either tablename alias or tablename as alias. That might work for many of the cases.... But for the full general case you'd need a full parser.....

Yes. If requirements got any bit more complex then what Appu's described the only good solution would be to parse query into expression tree in application. Then you could edit the tree as you like adding additional conditions, sub-queries, joining additional tables etc. You could also control exactly what commands can be used. But this is hard and I don't know any general library. I thought about pointing to the parser in Microsoft Entity Framework (SO answer in your first link) but I have never used it myself, so I don't know if it would work in this case. — Tomek Szpakowicz, Jun 25 '10 at 16:32

score 0 · Answer 4 · edited May 23 '17 at 12:00

Actually, you can. See my answer here: https://stackoverflow.com/a/19537164/88409

What you'll have to do is run all your static queries once with set showplan_xml on, parse the returned XML, and the very first <OutputList> you find will be the top-level output columns. As long as you assign an alias to the tables in your queries when they are first referenced, those aliases will carry through to the output column.

To go even further, I would have to surmise that such aliases CANNOT be optimized out, because the engine would have to use them to differentiate between different instances of the same column from the same table.

In fact, if you run a query like this: select * from Lessons, Lessons, the engine basically tells you as much with the message:

"The objects "Lessons" and "Lessons" in the FROM clause have the same exposed names. Use correlation names to distinguish them."

For example, if you run something like 'set showplan_xml on; select * from Lessons a, Lessons b, Lessons c, (select * from Lessons d) subquery_aliases_wont_stick_like_table_aliases`

You'll get output like this:

<OutputList>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[a]" Column="ID"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[a]" Column="Name"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[a]" Column="Description"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[a]" Column="Enabled"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[a]" Column="LessonTypeID"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[b]" Column="ID"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[b]" Column="Name"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[b]" Column="Description"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[b]" Column="Enabled"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[b]" Column="LessonTypeID"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[c]" Column="ID"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[c]" Column="Name"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[c]" Column="Description"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[c]" Column="Enabled"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[c]" Column="LessonTypeID"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[d]" Column="ID"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[d]" Column="Name"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[d]" Column="Description"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[d]" Column="Enabled"/>
  <ColumnReference Database="[sensitive]" Schema="[dbo]" Table="[Lessons]" Alias="[d]" Column="LessonTypeID"/>
</OutputList>

just to add... this only goes so far. Being able to differentiate columns of the same name cannot occur beyond a single level, because then the column names become ambiguous. For example, although you can run `select * from (select * from Lessons a) a, (select * from Lessons b) b`, you cannot then wrap that entire thing in parenthesis and select * from it as (subquery) c, because the query validation will fail saying something like `The column 'ID' was specified multiple times for 'c'.` — Triynko, Oct 23 '13 at 09:32

score 0 · Answer 5 · answered Jun 24 '10 at 18:08

0

I think Rob Farley's showplan xml will work for you (assuming you are running a late enough SQL Server that has this feature).

Each column seems to have <ColumnReference Server="" Database="" Schema="" Table="" Alias="" Column=""/> for each of the selected columns. Assuming you have at least one column from each table, it should be trivial to make a mapping between alias and table.

answered Jun 24 '10 at 18:08

Cervo

3,112
1
24
27

1

For me it seems like using debug data (symbols, local variable info etc.) in pdb file to peek inside dll when its published interface is not enough. This can give you a lot when you are debugging some problem. But if you rely on this information in real code, it will fail sooner or later. – Tomek Szpakowicz Jun 25 '10 at 06:12

Masanori Sugiura · Answer 6 · 2023-04-11T22:08:39.310

You can get schema name, table name and alias name by using SqlParser "Carbunql".

https://github.com/mk3008/Carbunql

using Carbunql;
using Carbunql.Tables;

var sq = new SelectQuery("select c.id as s,c.firstname from contact as c");

var root = sq.FromClause!.Root;
var table = root.Table;
var t = table as PhysicalTable;
Console.WriteLine("schema : " + t!.Schame);
Console.WriteLine("table  : " + t!.Table);
Console.WriteLine("alias  : " + root.Alias);

Result

schema :
table  : contact
alias  : c

Attention

Physical tables are not the only things that can be specified in the FROM clause. A subquery may also be specified.

In the above sample code, if anything other than the physical table name is specified, the type conversion to the "PhysicalTable" class will fail, so improvement is required to actually use it.

Since the SQL statement is parsed, if the schema name is omitted in the select query, the schema name cannot be obtained.

Getting table schema from a query

6 Answers6

Result

Attention

Linked