I've a text file containing large number of queries. I want to get all the distinct tables used in the entire file in all the queries. The table name can come after a 'from' or 'join'. How can i extract them by doing a regex match. Can anyone suggest a regular expression to get the matches?
6 Answers
It depends on structure of your file. Try to use this:
(?<=from|join)(\s+\w+\b)
Also turn on options Multiline if your not split your file in array or smth else with singleline string members. Also try to turn on IgnorCase option.

- 990
- 2
- 12
- 25
-
I think you have to move the \s+ into the positive lookbehind – macf00bar Nov 18 '10 at 11:41
-
1-1 Regex is not the correct way to solve this problem. As tdammers states, a SQL parser of some description is required to effectively solve this problem. – El Ronnoco Nov 18 '10 at 11:54
-
2@El Ronnoco I've asked for a solution using Regex and hence he has provided it. I just want a quick and dirty solution for this and i got it. – NLV Nov 18 '10 at 11:57
-
For support schema, i add few symbol: (?<=from|join)(\s+\w+\.+\w+\b) – Reza ArabQaeni Jun 15 '15 at 10:46
-
I think it's not gonna work if there is a commented select in your string. – Andre Soares Mar 18 '16 at 03:08
-
Actually there are a few edge cases on String functions: https://dev.mysql.com/doc/refman/5.7/en/string-functions.html SUBSTR, SUBSTRING and TRIM can specify some parameters using FROM keyword, which has nothing to do with tables – davidaam Jun 01 '17 at 13:16
-
you need to add INTO as this wouldn't work for insert statement – Tobi Owolawi Dec 17 '18 at 15:20
I'd use:
r = new Regex("(from|join)\s+(?<table>\S+)", RegexOptions.IgnoreCase);
once you have the Match object "m", you'll have the table name with
m.Groups["table"].Value
example:
string line = @"select * from tb_name join tb_name2 ON a=b WHERE x=y";
Regex r = new Regex(@"(from|join)\s+(?<table>\S+)",
RegexOptions.IgnoreCase|RegexOptions.Compiled);
Match m = r.Match(line);
while (m.Success) {
Console.WriteLine (m.Groups["table"].Value);
m = m.NextMatch();
}
it will print: tb_table tb_table2

- 11,655
- 1
- 30
- 43
Something like this maybe:
/(from|join)\s+(\w*\.)*(?<tablename>\w+)/
It won't match escaped table names though, and you need to make the regex evaluation case-insensitive.

- 20,353
- 1
- 39
- 56
-
-
absolutely wrong. You can't find table names with such regex. You will match words join and from also. Not only table names. – arena-ru Nov 18 '10 at 11:22
-
1I haven't tested it, but it should match each occurrence of 'from' or 'join', followed by at least one whitespace, and then one or more identifiers separated with dots. There is one error though; it should start with a start-of-word assertion, otherwise it will also match things like `foobarfrom blah`. And it doesn't take escaping into account, because that's a DBMS-specific thing - MySQL uses backquotes, PostgreSQL uses double quotes, T-SQL uses square brackets. – tdammers Nov 18 '10 at 11:29
-
1Oh, and obviously, regexes are not a reliable way for doing this anyway. If you want reliable, you need a full-blown SQL parser. – tdammers Nov 18 '10 at 11:30
can try this but it doesnt work for all the types of query,
public void Main()
{
// TODO: Add your code here
string Line = string.Empty;
using (StreamReader sr = new StreamReader(@"D:\ssis\queryfile.txt"))//reading the filename
{
var text = string.Empty;
do
{
// MessageBox.Show(Line);
text = Line = sr.ReadToEnd();// storing it in a variable by reading till end
MessageBox.Show(Line);
} while ((Line = sr.ReadLine()) != null);
var text1 = text.Replace("[", string.Empty).Replace("]", string.Empty);//replacing brackets with empty space
MessageBox.Show(text1);
Regex r = new Regex(@"(?<=from|join)\s+(?<table>\S+)", RegexOptions.IgnoreCase | RegexOptions.Compiled);//regex for extracting the tablename after from and join
Match m = r.Match(text1);//creating match object
MessageBox.Show(m.Groups[1].Value);
var v = string.Empty;
while (m.Success)
{
v = m.Groups[0].Value;
m = m.NextMatch();
StreamWriter wr = new StreamWriter(@"D:\ssis\writefile.txt", true);// writing the match to the file
var text2 = v.Replace(".", " ,"); // replace the . with , seperated values
wr.WriteLine(text2);
sr.Close();
wr.Close();
}
}
}

- 33
- 3
Solutions that can help you.
1. Extract table names with alias from an SQL statement with Regex
Regular expression
/(from|join|into)\s+([`]\w+.*[`] *\w+|(\[)\w+.*(\]) *\w+|\w*\.*\w+ *\w+)/g
2. Extract table names from an SQL statement with Regex
Regular expression
/(from|join|into)\s+([`]\w+.+\w+\s*[`]|(\[)\w+.+\w+\s*(\])|\w+\s*\.+\s*\w*|\w+\b)/g
Test string
-------------------------------------------------------
select * into [dbo].[table_temp]
from [dbo].[table_a] a inner join dbo.table_b b ...
join table_c c on ...
from dbo.table_d d ...
from `dbo`.`table_e` e ...
from table_f f ...
-------------------------------------------------------
Generated Code for C#
using System;
using System.Text.RegularExpressions;
public static class QueryExtension
{
public static List<string> GetTables(this string query)
{
List<string> tables = new List<string>();
string pattern = @"(from|join|into)\s+([`]\w+.+\w+\s*[`]|(\[)\w+.+\w+\s*(\])|\w+\s*\.+\s*\w*|\w+\b)";
foreach (Match m in Regex.Matches(query, pattern))
{
string name = m.Groups[2].Value;
tables.Add(name);
}
return tables;
}
public static string Join(this IEnumerable<string> values, string separator) {
return string.Join(separator, values);
}
}
How to use it.
string input = @"select * into [dbo].[table_temp]
from [dbo].[table_a] a inner join dbo.table_b b ...
join table_c c on ...
from dbo.table_d d ...
from `dbo`.`table_e` e ...
from table_f f ...";
Console.WriteLine(input.GetTables().Join("\n"));
Output
[dbo].[table_temp]
[dbo].[table_a]
dbo.table_b
table_c
dbo.table_d
`dbo`.`table_e`
table_f
Extract column names from an SQL statement with Regex
Regular expression
/(\w*\.*\w+|`\w*.*\w`|(\[)\w*.*(\]))+(,|\s+,|\s+FROM|\s+from)/g
Test string
-------------------------------------------------------
SELECT
[a].[column_1],
`b`.`column_2`,
c.column_3,
col4 as column_4,
col5 as `column_5`,
col6 as [column_6],
column_7,
a.col8 column_8,
(select max(column_x) from table_d where column_y in ('1','2','3')) as column_9
from table_a a
inner join table_b b on ...
inner join table_c c on ...
-------------------------------------------------------
Generated code for C#
public static class QueryExtension
{
public static List<string> GetColumns(this string query)
{
List<string> columns = new List<string>();
string pattern = @"(\w*\.*\w+|`\w*.*\w`|(\[)\w*.*(\]))+(,|\s+,|\s+FROM|\s+from)";
foreach (Match m in Regex.Matches(query, pattern))
{
string name = m.Groups[1].Value;
columns.Add(name);
}
return columns;
}
public static string Join(this IEnumerable<string> values, string separator) {
return string.Join(separator, values);
}
}
How to use it
string input1 = @"SELECT
[a].[column_1],
`b`.`column_2`,
c.column_3,
col4 as column_4,
col5 as `column_5`,
col6 as [column_6],
column_7,
a.col8 column_8,
(select max(column_x) from table_d where column_y in ('1','2','3')) as column_9
from table_a a
inner join table_b b on ...
inner join table_c c on ...
";
Console.WriteLine(input1.GetColumns().Join("\n"));
Output
[a].[column_1]
`b`.`column_2`
c.column_3
column_4
`column_5`
[column_6]
column_7
column_8
column_9
References
Regular Expression Language - Quick Reference
Simplified Regular Expressions
Try regular expressions online
-
I was looking for some ideas for myself, but I did not find them as expected, so I have to leave something. I hope you enjoy it. – Luis Eduardo Cochachi Chamorro Aug 24 '21 at 13:37
(from|join)\s(\w+)

- 1,078
- 7
- 17
-
Nope. That will match either just "from" or "join table_name", but not "from table_name". The pipe splits the entire pattern, not just the first part. – tdammers Nov 18 '10 at 11:09
-