2

I am trying to remove all the comments from a NVARCHAR value.

I don't know which value I will get to the NVARCHAR variable and I need to remove all the comments that start with -- until the end of the line.

For example:

-- Some Comments
SET NOCOUNT ON;

-- Some Comments

SELECT FirstName FROM dbo.Users WHERE Id = @Id;

After removing the comments it should look like this:

SET NOCOUNT ON;



SELECT FirstName FROM dbo.Users WHERE Id = @Id;

Is there any easy way doing it in T-SQL?

Thanks in advance.

Misha Zaslavsky
  • 8,414
  • 11
  • 70
  • 116
  • 2
    not easy at all, might wanna try to use another tool for this other than sql – Lamak Nov 02 '17 at 18:39
  • Is all comments start with `--` or there is `/**/` too? – Ilyes Nov 02 '17 at 18:40
  • @Sami Yes, all comments start with -- – Misha Zaslavsky Nov 02 '17 at 18:40
  • 1
    What about when `--` doesn't start at the beginning of a line? What if you get a query that contains `AND Field = 'Hi--there'`? Parsing comments will not be easy. I would suggest another tool; something that can parse SQL and spit it back out without the comments. – Cᴏʀʏ Nov 02 '17 at 18:43
  • My question would be why? Comments in code are there for a reason and removing them is big red flag that something is going on here that would concern me. – Sean Lange Nov 02 '17 at 18:43
  • @SeanLange I know, you are correct, but I have a specific case that I need to remove it... Long to explain.. – Misha Zaslavsky Nov 02 '17 at 18:46
  • @Cᴏʀʏ Thanks, probably that what I will do... – Misha Zaslavsky Nov 02 '17 at 18:47
  • 1
    @MishaZaslavsky [this](http://www.sqlservercentral.com/scripts/un+comment/66353/) could help – Lamak Nov 02 '17 at 18:50
  • 2
    Can I interest you in an [actual parser](https://www.nuget.org/packages/Microsoft.SqlServer.TransactSql.ScriptDom/)? Knowledge of C# required and slight overkill for the actual purpose (a simple state machine can do this particular job as well), but it'll give you a warm fuzzy feeling to know you're only removing comments. (There's also much cooler stuff you can do in terms of transforming the script once you have an actual parser, but that's another matter.) – Jeroen Mostert Nov 02 '17 at 18:50
  • @Lamak Great article. Helped me a lot. Just one thing didn't work and I paid attention that it substrings the comment until 100 characters, changed it to the max and it worked. – Misha Zaslavsky Nov 02 '17 at 19:19

1 Answers1

1

Using ngramsN4k:

CREATE FUNCTION dbo.NGramsN4K
(
  @string nvarchar(4000), -- Input string
  @N      int             -- requested token size
)
/****************************************************************************************
Purpose:
 A character-level N-Grams function that outputs a contiguous stream of @N-sized tokens 
 based on an input string (@string). Accepts strings up to 4000 nvarchar characters long.
 For more information about N-Grams see: http://en.wikipedia.org/wiki/N-gram. 

Compatibility: 
 SQL Server 2008+, Azure SQL Database

Syntax:
--===== Autonomous
 SELECT position, token FROM dbo.NGramsN4K(@string,@N);

--===== Against a table using APPLY
 SELECT s.SomeID, ng.position, ng.token
 FROM dbo.SomeTable s
 CROSS APPLY dbo.NGramsN4K(s.SomeValue,@N) ng;

Parameters:
 @string  = The input string to split into tokens.
 @N       = The size of each token returned.

Returns:
 Position = bigint; the position of the token in the input string
 token    = nvarchar(4000); a @N-sized character-level N-Gram token

Developer Notes:  
 1. NGramsN4K is not case sensitive

 2. Many functions that use NGramsN4K will see a huge performance gain when the optimizer
    creates a parallel execution plan. One way to get a parallel query plan (if the 
    optimizer does not chose one) is to use make_parallel by Adam Machanic which can be 
    found here:
 sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx

 3. When @N is less than 1 or greater than the datalength of the input string then no 
    tokens (rows) are returned. If either @string or @N are NULL no rows are returned.
    This is a debatable topic but the thinking behind this decision is that: because you
    can't split 'xxx' into 4-grams, you can't split a NULL value into unigrams and you 
    can't turn anything into NULL-grams, no rows should be returned.

    For people who would prefer that a NULL input forces the function to return a single
    NULL output you could add this code to the end of the function:

    UNION ALL 
    SELECT 1, NULL
    WHERE NOT(@N > 0 AND @N <= DATALENGTH(@string)) OR (@N IS NULL OR @string IS NULL);

 4. NGramsN4K is deterministic. For more about deterministic functions see:
    https://msdn.microsoft.com/en-us/library/ms178091.aspx

Usage Examples:
--===== Turn the string, 'abcd' into unigrams, bigrams and trigrams
 SELECT position, token FROM dbo.NGramsN4K('abcd',1); -- unigrams (@N=1)
 SELECT position, token FROM dbo.NGramsN4K('abcd',2); -- bigrams  (@N=2)
 SELECT position, token FROM dbo.NGramsN4K('abcd',3); -- trigrams (@N=3)

--===== How many times the substring "AB" appears in each record
 DECLARE @table TABLE(stringID int identity primary key, string nvarchar(100));
 INSERT @table(string) VALUES ('AB123AB'),('123ABABAB'),('!AB!AB!'),('AB-AB-AB-AB-AB');

 SELECT string, occurances = COUNT(*) 
 FROM @table t
 CROSS APPLY dbo.NGramsN4K(t.string,2) ng
 WHERE ng.token = 'AB'
 GROUP BY string;

----------------------------------------------------------------------------------------
Revision History:
 Rev 00 - 20170324 - Initial Development - Alan Burstein
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH
L1(N) AS
(
  SELECT 1 FROM (VALUES -- 64 dummy values to CROSS join for 4096 rows
        ($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),
        ($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),
        ($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),
        ($),($),($),($),($),($),($),($),($),($),($),($),($),($),($),($)) t(N)
),
iTally(N) AS 
(
  SELECT 
  TOP (ABS(CONVERT(BIGINT,((DATALENGTH(ISNULL(@string,''))/2)-(ISNULL(@N,1)-1)),0)))
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))    -- Order by a constant to avoid a sort
  FROM L1 a CROSS JOIN L1 b                       -- cartesian product for 4096 rows (16^2)
)
SELECT
  position = N,                                   -- position of the token in the string(s)
  token    = SUBSTRING(@string,CAST(N AS int),@N) -- the @N-Sized token
FROM iTally
WHERE @N > 0 
-- Protection against bad parameter values
AND @N <= (ABS(CONVERT(BIGINT,((DATALENGTH(ISNULL(@string,''))/2)-(ISNULL(@N,1)-1)),0)));

You can solve it using the solution below. This will be limited to NVARCHAR(4000) but I can put together an NVARCHAR(max) version if you need one. Also note that my solution ignores lines that begin with "--" and grabs everything up to "--" where the comment is deeper in. I'm not adressing /* this comment style */ but could be modified to do so.

Solution

-- sample stored proc
declare @storedproc varchar(8000) =
'-- Some Comments
SET NOCOUNT ON;

-- Some Comments

SELECT FirstName -- we only need the first name
FROM dbo.Users WHERE Id = @Id;';

--select @storedproc;

-- Solution
select cleanedProc = 
(
  select substring(item, 1, isnull(nullif(charindex('--', item),0)-1,nextPos))+br
  from
  (
    select 0 union all
    select position from dbo.ngramsN4k(@storedproc,1) 
    where token = char(10)
  ) d(position)
  cross apply (values (char(10), d.position+1,
           isnull(nullif(charindex(char(10), @storedproc, d.position+1),0),8000))
  ) p(br, startPos, nextPos)
  cross apply (values (substring(@storedproc, startPos, nextPos-startPos))) split(item)
  where item not like '--%'
  order by position
  for xml path(''), type
).value('(text())[1]', 'varchar(8000)');

before

-- Some Comments
SET NOCOUNT ON;

-- Some Comments

SELECT FirstName -- we only need the first name
FROM dbo.Users WHERE Id = @Id;

after

SET NOCOUNT ON;


SELECT FirstName 
FROM dbo.Users WHERE Id = @Id;
Alan Burstein
  • 7,770
  • 1
  • 15
  • 18