0

I am trying to remove all comments from a SQL string in javascript. I have this regex which removes almost all of them but there is an issue.

Please have a look on this link

https://regex101.com/r/JaEWKO/1

When there are nested multi line comments then the regex does not work as expected. The scenario has been mentioned in the link

I've gone through multiple links for this issue on stackoverflow ¯\_(ツ)_/¯

Regex to find sql comments

Regular expression to remove comments from SQL statement

and many others

here is the regex that im currently using

var sql = storedProc.replace(/(--.*)|(((\/\*)+?[\w\W]+?(\*\/)+))/g, '')

I expect it to remove all the comment from the string but it break in case of nested multi line comments.

Can anyone help me with this?

phuzi
  • 12,078
  • 3
  • 26
  • 50
Muhammad Omer
  • 111
  • 2
  • 14

2 Answers2

1

You're essentially looking for balancing groups of /* with */, but unfortunately recursive patterns aren't supported in Javascript. As an alternative, you could try matching /*s followed by */s, with no /* or */s in between, and continually replace with the empty string:

const str = `--GO
--SET ANSI_NULLS ON
--GO
--SET QUOTED_IDENTIFIER ON
--GO

something which is not a comment

/*
    DECLARE @Gas SMALLDATETIME = '1/1/2016',
            @System              VARCHAR(Max) = '30',--'12,76,466,465,30,226',
            /*following param is to help decide Transportation or Storage */
            @Contract         INT = 0,--Transport or (1 = storage). 
            /*following param is to help decide dth or %*/
            -- Contract Period 0 = 5 Years, 1 = 10 years,..., 4=25 years
            
*/`;
let intermed = str.replace(/--.*\n?/g, '');
let match;
while (match = intermed.match(/(?:\/\*)+(?:(?!\*\/|\/\*)[\w\W])+(?:\*\/)+/)) {
  intermed = intermed.replace(match[0], '');
}
console.log(intermed);
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
  • The tempered greedy token in this case might cause considerable slowdown with long comments. I'd suggest unrolling it, see [my comment](https://stackoverflow.com/questions/55608634/regex-to-remove-nested-multi-line-comment#comment97912254_55608634). Anyway, it is not guaranteed to work as expected in 100% cases. – Wiktor Stribiżew Apr 10 '19 at 09:20
  • @WiktorStribiżew "Anyway, it is not guaranteed to work as expected in 100% cases" Can you elaborate in which case it will fail? – Muhammad Omer Apr 10 '19 at 12:57
  • @MuhammadOmer See https://regex101.com/r/cWCVO8/1. If ``/*`` or ``*/`` appear inside strings, the regex will not match expected strings. – Wiktor Stribiżew Apr 10 '19 at 12:59
  • _/* /*following param is to help decide Transportation or Storage */ @Contract INT = 0,--Transport or (1 = storage). /*following param is to help decide dth or % -- Contract Period 0 = 5 Years, 1 = 10 years,..., 4=25 years */_ This is a valid comment in SSMS but it wont work with the provided regex. Should I be using any parser or something if it is not achievable through regex? If yes then can anyone guide me in that regard. – Muhammad Omer Apr 11 '19 at 08:45
  • @MuhammadOmer The `/*`s and `*/`s are not balanced in that string - what sort of output were you expecting from it? – CertainPerformance Apr 11 '19 at 08:54
  • @CertainPerformance this is exactly what im saying that even when `/*`s and `*/`s are not balanced, it is considered a valid comment in SSMS. I expect this type of comment to be removed from the string as well. I know it will be a lot difficult to achieve it just with the regex so asking about something else if there is any. – Muhammad Omer Apr 11 '19 at 08:59
  • @MuhammadOmer Can you explain the logic behind it? Is it that repeated comment delimiters next to each other (or separated by a space) are considered as a single comment delimiter? – CertainPerformance Apr 11 '19 at 09:11
  • @CertainPerformance Let me explain a little. `/*/**/` is a valid comment in SSMS because it considers **/*** as the text inside the multi line comment section. Answering to _"Is it that repeated comment delimiters next to each other (or separated by a space) are considered as a single comment delimiter?"_ **Yes** even if there are any characters in between. – Muhammad Omer Apr 11 '19 at 11:24
  • @MuhammadOmer If anything can come in between, then you might as well just greedy-match anything that comes in between: [`\/\*.+\*\/`](https://regex101.com/r/JaEWKO/3), right? – CertainPerformance Apr 12 '19 at 00:58
  • @CertainPerformance maybe I am unable to convey my problem statement clearly on this. Anyways I suspect that what I want to do is not achievable with regex. – Muhammad Omer Apr 12 '19 at 21:00
0

For removing all types of comments from SQL, see pg-minify.

It works well for PostgreSQL + MS-SQL + MySQL, plus it can compress the resulting SQL.

vitaly-t
  • 24,279
  • 15
  • 116
  • 138