0

I have a huge codebase that has a lot of JAVA and .sql files. I intend to extract all the SQL statements from all these files.

Here is the way I intend to achieve this -

  1. Build a regex file containing patterns like select, insert, delete, update etc that I intend to extract.
  2. Parse files line by line in code base and match against Regex. If match is found, register the code line, file name, pattern etc.

The problem comes when the SQL queries are broken into multiple lines.

I am using java.util.regex.Matcher and java.util.regex.Pattern methods to build/match regex patterns when I read lines using Buffered Reader.

  pattern = Pattern.compile(regexString,Pattern.CASE_INSENSITIVE);
  .
  .
  matcher = pattern.matcher(lineBuffer.readline().trim());
  if( matcher.find()){
    //Do something
  }

For Multiline statements, I should look for the line terminator ";" - if not found, I can read next line and append the output in matched string so that it is understood as a single query.

 while(!lineString.endsWith(";")) {
   lineString = lineString + lineBfr.readLine().trim();
 }

Is there a better way to achieve a solution to the requirement here using Pattern.MULTILINE or using StreamBuffers where I read the entire file into a single buffer and process further?

Prasoon
  • 425
  • 1
  • 6
  • 18

3 Answers3

1

You could take advantage of Apache Commons, and their great method FileUtils#readFileToString(File file).
This combined with the Pattern.MULTILINE you mentioned seems like a potentially very simple solution.

Keppil
  • 45,603
  • 8
  • 97
  • 119
1

Unless your file contains only SQL statements in a consistent predictable format, I think that parsing SQL statements with regexes will turn out to be extremely complicated.

It may seem fairly simple on its face, but there are lots and lots of special cases to handle.

  • What about properly handling comments?
  • What about the word SELECT or a semicolon contained within a string?
  • What about string escaping characters?

And so on. If you don't handle all of these, you won't even be able to do something as simple as get all the statements.

A better approach would be to use an existing library to parse the SQL for you. In this question, there appear to be several good options for Java: SQL parser library for Java

Community
  • 1
  • 1
dan1111
  • 6,576
  • 2
  • 18
  • 29
0

For Java files, or more accurately, compiled Java classes, I'm trying something similar, but I'm using the Java decompiler javap -verbose to search for the strings (type Utf8) that start with select or that otherwise appear to be SQL statements.

It's a work in progress, and I'm muddling through myself, so I can't post a complete solution, but this should be enough to build off of.

Menachem
  • 911
  • 7
  • 22