For a statement with no sub-queries, I'd simply start with a simple algorithm.
First replace all white-space (spaces, tabs, newlines, etc) with a single space.
Then find the first occurrence of " where "
(note the surrounding spaces) - that's the starting point (excluding the initial space, of course).
From there, locate the first " order by "
or " group by "
- that's your ending point. If that text doesn't exist, the ending point is the end of the string itself.
In pseudo-code, it will be something like:
def getWhereClause (query):
query = query.replaceAll ("[ \t\n]*", " ")
whereClause = ""
while query is not "":
if query.startsWithIgnoreCase (" where "):
whereClause = "where "
query = query.substring (7)
next while
if query.startsWithIgnoreCase (" order by ") or
if query.startsWithIgnoreCase (" group by "):
return whereClause
if whereClause != "":
whereClause = whereClause + query.substring (0,1)
query = query.substring (1)
return whereClause
That should cover the vast majority of non-sub-query queriesand you can add more possibilities to the end conditions as they appear.
One thing to keep in mind is that those text searches and manipulations should really only be done outside of quoted areas. This means you'll need a mini-parser for the statement rather than simple string searches as shown in the pseudo-code if you want a proper solution.
I put forward this solution as a (rather quick and dirty) one since it may well do exactly what you need without the complications of a full SQL parser which, believe me, is not a trivial exercise, from painful experience :-)