0

I'm currently looking for a library to use in my VB.NET WinForms application to do the following:

  • Parse a SQL statement extracted from the command text in a Crystal Reports .rpt file into its individual clauses (SELECT, FROM, JOIN, WHERE, ORDER BY, etc.)
  • Modify one or more of the parsed clauses of that statement
  • Recompile/reconstruct the statement with the changes before piping it back into the .rpt file for execution and report generation.

Specifically, the SQL statement in question is querying against a PostgreSQL database and looks something like this (mixed-case, quoted identifiers and all):

SELECT * FROM "[SCHEMANAME]"."[VIEWNAME]" WHERE "[COLUMNNAME]" IN ('[VALUE1]','[VALUE2]');

Does anyone know of a workable/usable library that will actually do this? I've read through a number of questions on this site as well as other sources found in Google search results for potential solutions, but I've yet to find a viable, working method to accomplish all of this in a reasonable way with exposed properties/methods to "simplify" the process. Just a few of the SO questions I've looked at so far include:

All I really want to do is dynamically take this (or any) SQL statement, break out the WHERE clause and modify it, add an ORDER BY clause, then recompile it and update the Crystal Reports Command data source. Yes, I am aware that this specific example can be handled manually very easily, but I also have a number of more complex queries/statements that I'd potentially like to be able to perform a similar operation on without having to rewrite the entire statement or manually find the positions of the clauses and use String.Replace(), or other more complex methods to achieve that goal.

Regardless, what I'd like to see from whatever solution I end up using would be to parse the above statement with at the very least something like the following results:

Statement Type:    SELECT
SELECT Clause:     SELECT *
FROM Clause:       FROM "[SCHEMANAME]"."[VIEWNAME]"
WHERE Clause:      WHERE "[COLUMNNAME]" IN ('[VALUE1]','[VALUE2]')

Of course, I'd prefer to have it provide additional levels of "drilling down" - a List(Of<T>) (or something) for the columns in the SELECT clause, specific "keys" from the WHERE clause, etc. I've tested with a number of the parsers available, including a trial of the commercial "General SQL Parser" library and the Microsoft.SqlServer.Management.SqlParser library. I've also tried some "less popular" options from the NuGet catalog, including one that uses ANTLR and the GOLD Engine.

In my tests, however, I've not been able to find a totally satisfactory solution. Admittedly, there's an element of unfamiliarity with the tools that definitely plays into my findings, but I'm left wondering if there's anything out there that can do for me what I want or if I'm going to have to "roll my own". I'd like to be able to do things like:

  • WHEREClause.Add("""Column1"" = 'NewValue'")
  • ORDERBYClause.Add("""Column2""", SortOrder.Descending)
  • ORDERBYClause.Clear()
  • SELECTClause.Remove("""ColumnA""", """Alias""")

The Microsoft.SqlServer.Management.SqlParser library seems to do a decent job of parsing a simple query like the above but, obviously, when it comes to some of the PostgreSQL-specific syntax (e.g., concatenation with ||), it throws errors preventing me from accurately reconstructing the SQL statement.

Licensing costs notwithstanding, the General SQL Parser library is one of the only options I've found that supports the PostgreSQL syntax (although it took me a bit to figure out how to get to that point) but doesn't seem to expose the information I need in a useful way for my requirements. If it does, I've not yet been able to get to it. Perhaps I'm being dense, but I've been unable to figure out how to get the online example to work in my own environment. From what I can tell, getting what I want is going to require the creation of a bunch of "extra" bits just to achieve my goal. Besides, its price is a bit steep for my situation as a one-man Development/IT department.

What would be fantastic, IMO, would be if I could simply do the parsing and such with the Npgsql library my application is already using for other database operations, but that doesn't seem to be an option either (unless, again, I'm just overlooking something).

I started playing with the SQL Parser code from Sergey Gorbenko on CodeProject, but this also doesn't "natively" support PostgreSQL syntax, and the amount of work it would take to bring that "up-to-snuff" is beyond what I have the time or gumption to invest at this point.

As I've said, the example above is very simple, but I'd like to be able to use whatever I eventually find for parsing more complex statements with subqueries, joins, unions, and all sorts of other SQL elements without having to jump through a bunch of hoops every time. Are there any other libraries or projects out there that anyone has been able to successfully use to perform these operations (parse, edit, recompile) with a PostgreSQL-specific SQL statement?

G_Hosa_Phat
  • 976
  • 2
  • 18
  • 38

1 Answers1

1

You could take a look at ANTLR. It will generate C# targets and there is an existing SQL grammar (here). This should be able to parse your SQL and produce an in-memory ParseTree data structure that easily exceeds your requirements.

With ANTLR, take a look at the TokenStreamRewriter. It allows you to make modifications to the input stream that you parsed and insert of remove content (preserving the rest of the input stream). I would definitely advise against any string manipulation approach. That's really just a variation on a SQL injection vulnerability.

It your needs somehow outgrow what TokenStreamRewriter provides (probably unlikely), then you have the full parse tree in memory and could modify that structure in place and work out a listener/visitor to serialize it back out to text.

In short, unless you find a purpose-built SQL modifier tool, this gives you the right approach to properly parse everything, correctly identifying all the pieces, then work with that data structure to write out your modified results.

Mike Cargal
  • 6,610
  • 3
  • 21
  • 27
  • Thank you. I'll take another, closer look at ANTLR - I admittedly didn't give it much of a chance because I was looking more for something a bit more specific. However, from what I've seen, there isn't a grammar defined for PostgreSQL. The "closest" one I can find is the Oracle PL/SQL, but perhaps it will be sufficient. My queries can get a bit complicated with `JOIN`s across several tables/views with mixed-case names, `COALESCE` operations, string concatenation, and other little "gotchas" from time-to-time, so I was hoping for a more "focused" utility/library. Still, I'll see how it goes. – G_Hosa_Phat Mar 21 '21 at 01:51
  • Well, I'm not sure why I'm having so much trouble, but I've spent the past four days trying to get a usable ANTLR plsql parser and I'm just getting frustrated. After several attempts, I was finally able to get the grammar files processed with the ANTLR tool to get some C# files, but I can't get anything to compile, let alone actually work. I think I'm just going to have to "drop back and punt". Maybe I'll come back to it later, but for now I'm just going to rebuild the SQL from scratch as I have no idea what I'm doing wrong. Thanks for the help, but I guess I'm just not ready for ANTLR – G_Hosa_Phat Mar 25 '21 at 18:43