2

My question should be simple for many of you Supouse I have the following SQL and I want to get the table name using regexp:

SELECT name, age FROM table1

Using this expression I can get that ok

Pattern p = Pattern.compile(".*FROM\\s+(.*?)($|\\s+[WHERE,JOIN,START\\s+WITH,ORDER\\s+BY,GROUP\\s+BY])", Pattern.CASE_INSENSITIVE);
        Matcher result = p.matcher(pSql);
        if (result.find()) {
            lRetorno = result.group(1);
        }

But, in case the table name contains the schema name (xyz.table1) my expression brings everything. My question is ... what do I need to modify on this query to only return me the table name without schema/owner?

Any help would be extremely apreciated Regards

Raphael Moita

kosa
  • 65,990
  • 13
  • 130
  • 167

2 Answers2

5

Maybe try it this way

String data1="SELECT name, age FROM table1 whatever";
String data2="SELECT name, age FROM schema.table1 whatever";

Pattern p=Pattern.compile("from\\s+(?:\\w+\\.)*(\\w+)($|\\s+[WHERE,JOIN,START\\s+WITH,ORDER\\s+BY,GROUP\\s+BY])",Pattern.CASE_INSENSITIVE);

//test
Matcher m=p.matcher(data1);
while(m.find())
    System.out.println(m.group(1));
m=p.matcher(data2);
while(m.find())
    System.out.println(m.group(1));

output:

table1 
table1 

Edit

I just realized that part ($|\\s+[WHERE,JOIN,START\\s+WITH,ORDER\\s+BY,GROUP\\s+BY]) doesn't work as it should because in my input i placed "whatever" after table name and it was found anyway.

It doesn't work like you because you are using [WHERE,JOIN,START\\s+WITH,ORDER\\s+BY,GROUP\\s+BY] instead of (WHERE|JOIN|START\\s+WITH|ORDER\\s+BY|GROUP\\s+BY). For example [abc] is equal to (a|b|c) so it says regular expression engine to accept any character from that set, not a word abc. Improve your pattern to something like

Pattern p=Pattern.compile("from\\s+(?:\\w+\\.)*(\\w+)(\\s*$|\\s+(WHERE|JOIN|START\\s+WITH|ORDER\\s+BY|GROUP\\s+BY))",Pattern.CASE_INSENSITIVE);
Pshemo
  • 122,468
  • 25
  • 185
  • 269
  • Pshemo: it works fine, thanks buddy! Would you mind to explain me what exactely (?:\\w+\\.)*(\\w+) does? What does this colon in front the question mark is doing there? – VeryNiceArgumentException Jul 09 '12 at 20:12
  • 1
    Group with `?:` at the beginning will not be counted. Thanks to that I was able to call `m.group(1)` always, even if there was some data in `(?:\\w+\\.)`. – Pshemo Jul 09 '12 at 20:19
  • You make me think ... I don't need to control any set of words like I was doing [WHERE|JOIN|START ..., I just need to figure out where is the end of the table name: Pattern p=Pattern.compile(".*FROM\\s+(?:\\w+\\.)*(\\w+)(\\s*$|\\s+\\w+)",Pattern.CASE_INSENSITIVE); Right?! – VeryNiceArgumentException Jul 09 '12 at 20:46
  • Let me make other question ... I just realized that I also need to remove the table alias from the column names (SELECT xyz.name, xyz.age FROM table1 xyz). I'm doing that using two extra steps: "result.group(1).replaceAll("\\w+\\.", "").replaceAll("(\\s+,|,\\s+)", ",")". Do you think I could do it in one shot changing something on the main expression? – VeryNiceArgumentException Jul 10 '12 at 10:58
  • 1
    If the way you are replacing it now works for you and you understand how it works then I would leave it that way. But if you insists to replace it in one `replaceAll` method you can try something like this `replaceAll("\\w+\\.|(?<=,)\\s+|\\s+(?=,)", "")`. To understand `(?<=,)\\s+` or `\\s+(?=,)` read [this grate tutorial](http://www.regular-expressions.info/lookaround.html) – Pshemo Jul 10 '12 at 11:45
  • This regex does not works. Checked in https://regex101.com/ – mohor chatt Sep 30 '22 at 09:43
  • @mohorchatt Could you share link with examples? Something like https://regex101.com/r/3erxQq/1 (you need to use "save regex" option to generate such link when you are ready). – Pshemo Sep 30 '22 at 13:42
2

Rather than parsing with a regular expression, you might have a better time using an actual parser. See this related question Need java API to parse SQL statements which recommends the Zql library.

It's got examples of the queries it can parse that seem at least as robust as any regex you're going to write.

InputStream is = new ByteArrayInputStream(sqlQueryString.getBytes("UTF-8"));
ZqlParser parser = new ZqlParser(is);
ZQuery query = (ZQuery)parser.readStatement();
Vector tables = query.getFrom();

should do it for you. tables is a Vector of ZFromItems, and you can do getTable() on one to get its name without the schema component.

Community
  • 1
  • 1
Nathaniel Waisbrot
  • 23,261
  • 7
  • 71
  • 99