Try the following regex, which should only yield $true
if a comment-internal GO
is found; note that it also detects GO
followed by a (decimal) number correctly:
@'
/* a comment with GO, but not on its own line */
/* This GO should be found.
GO 12
*/
/* This one is outside a comment */
GO
'@ -match '(?sm)/[*](.(?![*]/))+?^\s*go(\s+\d+)?\s*$'
The above yields $true
due to the presence of the comment-embedded GO 12
.
(?sm)
turns on inline options s
(make .
match \n
too) and m
(make ^
and $
match the start and end of lines too).
/[*]
matches the opening of a block comment (*
is a metacharacter that must be escaped (\*
) in order to be interpreted literally or specified inside a character set ([...]
), as here).
(.(?![*]/))+?
matches a single character (.
) not followed by literal */
(using (?!...)
, a negative lookahead), one or more times (+
), but non-greedily (?
).
- This is the key to matching a
GO
line truly only inside a block comment.
^\s*go
matches the start of a line (^
), followed by a possibly empty run of whitespace (\s*
), followed by literal go
(note that PowerShell's -match
operator is case-insensitive).
(\s+\d+)?
optionally (?
) matches a nonempty run of whitespace (\s+
) followed by one or more (+
) digits (\d
).
\s*$
matches a possibly empty run of whitespace through to the line's end.
Assuming that all block comments are well-formed, there's no need to match the remainder of the comment.
For going beyond just rejecting undesired input, TheMadTechnician suggests using -split
, which can be used to effectively eliminate those block comments that have embedded GO
lines from the input:
$sanitized = @'
/* a comment with GO, but not on its own line */
before
/* This GO should be found.
GO 12
*/
after
/* This one is outside a comment */
GO
...
/* Another comment with a GO.
foo
GO
*/
last
'@ -split '(?sm)/[*](?:.(?![*]/))+?^\s*go(?:\s+\d+)?\s*$.+?[*]/' -join ''
The above stores the following in variable $sanitized
- note that the block comments with embedded GO
statements are gone:
/* a comment with GO, but not on its own line */
before
after
/* This one is outside a comment */
GO
...
last
If you then want to break the resulting script into the constituent batches by the remaining - uncommented, effective - GO
statements:
$sanitized -split '(?m)^\s*go(?:\s+\d+)?\s*$'
As you point out, GO
isn't actually a part of T-SQL:
GO
is not a Transact-SQL statement; it is a command recognized by the sqlcmd
and osql
utilities and SQL Server Management Studio Code editor
As for what you tried:
Your /\*(.?([^*][^/])*?)^\s*?go
subexpression (simplified here) intended to match the start of a block comment up to an embedded GO
is ineffective at ensuring that substring */
is not present; it produces both false positives and false negative.
Example of false positive (matches, but shouldn't):
/*a*/
go
Example of a false negative (doesn't match, but should):
/*a*
go
As you suspected in a comment, the problem is that [^*][^/]
matches a pair of characters, so the matching behavior ultimately depends on whether the number of input characters is odd or even; using simplified examples:
# Even number of chars. -> $false, as intended
'*/' -match '^(.?([^*][^/])*?)$'
# Odd number of chars. -> $true(!)
'*/a' -match '^(.?([^*][^/])*?)$'
Only a negative lookahead assertion, as shown above, can reliably exclude a given (multi-character) string.