0

I need a regex expresssion which match below strings in java.

select * from test

select * from test where a=100

select * from test where a=100 and b=100

the select statement may or may not contain where condition, also it may contain 1 to n number of conditions, i need a single regex to match all of those, can anyone help

i tried to match the string by using below regex expression, but it fails select (([A-Za-z0-9]{1,20}(|\\,))*|\\*) from ([A-Za-z0-9]{1,20})( where ([A-Za-z0-9]{1,20}((| )(\\=|\\>|\\<)(| ))[A-Za-z0-9]{1,20})|$)

Criteria:

i) a string should start with select

ii) if where is present it should be followed by a pair of variable and value

  like below:

        `select * from test where a=10;`

iii) Between two variable and value pair and should be present

  like below:
         `select * from test where a=10 and b=10`  

iv) a string may or may not contain where

v) a string should not end with and

public class Test {
public static Pattern pattern;
public static Matcher matcher;

private static final String PATTERN="(\\s)*select(\\s)+((\\*)|([a-zA-Z0-9]+))(\\s)+from(\\s)+[a-zA-Z0-9]*(\\s)+where(\\s)+(‌​[a-zA-Z0-9]*(\\s)*=(\\s)*[0-9]*)+(\\s)*((\\s)*and(\\s)*([a-zA-Z0-9]*(\\s)*=(\\s)*[0-9]*)‌​)*(\\s)*";


public static void main(String[] args)throws IOException {
    BufferedReader br=new BufferedReader(new InputStreamReader(System.in));
    String data;        
    pattern=Pattern.compile(PATTERN,Pattern.CASE_INSENSITIVE);
    do{         
    data=br.readLine();
    matcher=pattern.matcher(data);
    if(matcher.find()){
        System.out.print("True");
    }else{
        System.out.print("false");
    }
    System.out.print(":"+data+":\n");
    }while(data!="exit");
}

}

Test Cases(should show true)

select a from test

select a from test where a=10

select a from test where a=10 and b=10

select a from test where a=10 and b=10 and c=10

Test Cases(should show false)

select a from test where

select a from test where a=10 and

select a from test where a=10 b=10

select a from test where a=10 and b=10 and

thanks

Pradeep
  • 51
  • 1
  • 8
  • You ought to make a start on this and submit your progress: that way we know how to gauge the answer detail. – Bathsheba May 15 '14 at 07:30
  • 2
    @Pradeep I'd put that into your question – awksp May 15 '14 at 07:33
  • 1
    Please put your regular expression in the actual question: it doesn't really belong in a comment; you'll attract downvoting the way things are. – Bathsheba May 15 '14 at 07:34
  • So you just want to match each line beginning with `select * from test`, or you want to parse the conditions or...? Can you give an expected output? – Robin May 15 '14 at 07:38
  • its like sql select statment, it may be vary on each case. below are the example string select a from test select a,b from group select * from group where a=1 and b=10 etc – Pradeep May 15 '14 at 07:41
  • Don't forget to escape the backslashes: you need \\ rather than \. – Bathsheba May 15 '14 at 07:44
  • So you want to match any `select` sql statement? Can you just match everything from one `select` to the next `select` keyword? – Robin May 15 '14 at 07:46
  • @Pradeep please differentiate between **match string** and **validate syntax**! – bobbel May 15 '14 at 07:48
  • yes i want to match sql select statement – Pradeep May 15 '14 at 07:48
  • @bobbel Regex Expression: `select (([A-Za-z0-9]{1,20}(|\\,))*|\\*) from ([A-Za-z0-9]{1,20})( where ([A-Za-z0-9]{1,20}((| )(\\=|\\>|\\<)(| ))[A-Za-z0-9]{1,20})|$)` matching String: **select a from test** **select a,b from group** **select * from group where a=1 and b=10** – Pradeep May 15 '14 at 07:54
  • Phrasing it differently: is [this](http://regex101.com/r/tP3vD1) enough? Why/were are you matching the strings? – Robin May 15 '14 at 07:55
  • http://stackoverflow.com/questions/139926/regular-expression-to-match-common-sql-syntax – bobbel May 15 '14 at 07:56
  • @robin thanks for your quick response. the expression matching all string, it even its wrong. It should match : **select * from test** not **select * from test** where if **where** is present, it should be followed by a variable and a value if more then one variable and value pair occurs it should be followed by a **and** – Pradeep May 15 '14 at 08:23
  • @Pradeep: yes, it will match those. This means you have more requirements. Please edit your question to let us know *exactly* what you want to match/don't match, why, how... There are many ways to match the 3 strings you gave. Explain which one you need. – Robin May 15 '14 at 08:25
  • @robin i have updated my requirement to the question – Pradeep May 15 '14 at 08:36
  • @bobbel: so you are saying there is no solution for this? – Pradeep May 15 '14 at 08:39
  • @Pradeep I mean: with a regular expression, you are not able to validate **all** SQL statements (since you didn't mention what you really want to match or validate). May I ask you, what you really want to do? What do you want to achieve by matching these statements? From whom do you get these statements? You could just use a SQL parser if you want to parse SQL, couldn't you? Did you know, that there could be aliases, string functions and other cool things? At this point I want to ask another question: which SQL dialect do you want to use? Any? – bobbel May 15 '14 at 09:05
  • @bobbel i don't want to validate all the select statement in SQL, i need to validate the above cases alone, its a part in my college project, need to validate user select statement string. doing my project in java – Pradeep May 15 '14 at 10:25

2 Answers2

0

I am new to Regular expressions. After reading this question i tried the following in online regex testers. it works fine.

(\s)*select(\s)+((\*)|([a-zA-Z0-9]+))(\s)+from(\s)+[a-zA-Z0-9]*(\s)+where(\s)+([a-zA-Z0-9]*(\s)*=(\s)*[0-9]*)+((\s)+and(\s)+([a-zA-Z0-9]*(\s)*=(\s)*[0-9]*))*(\s)*

I think it is too long. But it works fine for case 2 and case 3. If you want to achieve only these cases through regex , you need two regex to achieve your desired solution.

If you want to achieve all the three in single regex its not possible[According to my knowledge] .
As Bobbel mentioned in comments , Its not possible.

Because you need to check whether "where" is present or not . If present , it should have condition. Else "where" itself shouldn't be present.

EDIT

import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class HelloWorld
{
  private static final String PATTERN="(\\s)*select(\\s)+((\\*)|([a-zA-Z0-9]+))   (\\s)+from(\\s)+[a-zA-Z0-9]*(\\s)+where(\\s)+([a-zA-Z0-9]*(\\s)*=(\\s)*[0-9]*)+ ((\\s)+and(\\s)+([a-zA-Z0-9]*(\\s)*=(\\s)*[0-9]*))*(\\s)*";

 public static void main(String []args)
 {
    System.out.println("Hello World");
    String data="select * from asdf where a  = 20 and ax =20";        
    Pattern pattern=Pattern.compile(PATTERN); 
    Matcher matcher=pattern.matcher(data);
    if(matcher.find())
    {
      System.out.print("True");
    }
    else
    {
      System.out.print("false");
    }
    System.out.print(":"+data+":\n");
 }

}

See screenshots

Community
  • 1
  • 1
Gibbs
  • 21,904
  • 13
  • 74
  • 138
  • It is also possible matching all three in one with something like this: `select...([\s]+where...([\s]+and...)*)*` - http://regex101.com/r/uY2iP0. But anyway, you are very strict with these expressions. What about `IS NULL` or `<`, `>`, `>=`, `<=`. I still don't know, what @Pradeep expects... – bobbel May 15 '14 at 12:08
  • Yeah it's still possible. It ll be too large regex. He specifies only "and" . So i din't check for relational operators. – Gibbs May 15 '14 at 12:26
  • That's what i given above in that regex. It works even if you add 100 "and". – Gibbs May 15 '14 at 12:42
  • @user3168736: the regex which you provided is not satisfying my criteria which i mentioned in my question. – Pradeep May 26 '14 at 09:44
  • @Pradeep Try the following `(\s)*select(\s)+((\*)|([a-zA-Z0-9]+))(\s)+from(\s)+[a-zA-Z0-9]*(\s)+where(\s)+([a-zA-Z0-9]*(\s)*=(\s)*[0-9]*)+(\s)*((\s)*and(\s)*([a-zA-Z0-9]*(\s)*=(\s)*[0-9]*))*(\s)*` And could you please post your test case examples. – Gibbs May 26 '14 at 11:21
  • @user3168736 : i have updated my code and test cases in my question. – Pradeep May 27 '14 at 05:18
  • @pradeep , I checked the above cases with [http://java-regex-tester.appspot.com/]. As i said u need two or more regex to check cases with `where` condition and without `where` condition. The following regex ll work if you have `where` condition `(\s)*select(\s)+((\*)|([a-zA-Z0-9]+))(\s)+from(\s)+[a-zA-Z0-9]*(\s)+where(\s)+([a-zA-Z0-9]*(\s)*=(\s)*[0-9]*)+((\s)+and(\s)+([a-zA-Z0-9]*(\s)*=(\s)*[0-9]*))*(\s)*` – Gibbs May 27 '14 at 05:36
  • It returns false in all the test cases in list 2 and satisfies all the test cases except first in list 1 – Gibbs May 27 '14 at 05:37
  • @user3168736 : im getting false only for all test cases – Pradeep May 27 '14 at 06:50
  • @user3168736 : please see my edit above , i'm getting false only. – Pradeep May 27 '14 at 08:26
  • Error maybe in your pattern declaration. please Check it clearly – Gibbs May 27 '14 at 10:24
0

Finally i found solution for my question. this will match all my criteria.

select\s(([A-Za-z0-9]{1,20})(|\,[A-Za-z0-9]{1,20})+|\*)\sfrom\s([A-Za-z0-9]{1,20})(\swhere\s[A-Za-z0-9]{1,20}\=[A-Za-z0-9]{1,20}($|(\sand\s[A-Za-z0-9]{1,20}\=[A-Za-z0-9]{1,20})+$)|$)

Hope it will help anyone in future.

Pradeep
  • 51
  • 1
  • 8