-1

I have text which looks like this

Key:Value
Key2:Value2
Key3:Value3

i have
a very 
long
uncommom
text which i need to drop

Key4:Valu4

A second example would be

Key:Value
Key2:Value2
Key3:Value3
i have a very long uncommom text which i need to drop

Key4:Valu4

The point is to remove sentence

i have a very long uncommom text which i need to drop

from those two SQL strings. I need to mention in contains CHAR(10) and CHAR(13) inside.

I need to keep new lines on key:value pairs.

How to achieve this?

This wont work.

Replace(@string,'i have a very long uncommom text which i need to drop
','')

Edit: Also is there any way to replace double/triple/quadruple new line breaks with just one?

Dale K
  • 25,246
  • 15
  • 42
  • 71
  • 1
    Looks to me like you need a regex. SQL Server has patindex but it is limited in what it can do. Personally, I'd be trying doing the manipulation outside of the database and then reload the clean data. – DeanOC Oct 31 '19 at 19:33
  • @DeanOC I used SQL trigger to parse this from bunch of HTML code. And i got clean text just trying to sort it out now. I'm afraid i will need to use .NET to parse that locally – Programer Anel Oct 31 '19 at 19:39
  • Would it be adequate to split into separate strings on newlines and then concatenate all of the resulting strings (with newline separators) that, say, don't contain exactly one colon? (Multiple newlines would split into empty strings and vanish.) – HABO Oct 31 '19 at 20:01
  • @HABO are we talking about before parsing HTML or after parsing the HTML . ? – Programer Anel Oct 31 '19 at 20:10
  • @ProgramerAnel After parsing, i.e. with your example data. – HABO Oct 31 '19 at 20:16
  • @HABO That is out of my knowledge. How can i do that ? – Programer Anel Oct 31 '19 at 20:18
  • Since you're using sql server, you could look at creating a CLR trigger which would allow you to clean the text using .Net. – DeanOC Oct 31 '19 at 20:42
  • what version of SQL server are you on? – Tim Mylott Oct 31 '19 at 21:13
  • And I think a better example of the data would be needed as well. You state that the Key/Value pairs would include new lines. How are we to know that the particular text you are trying to remove wasn't actually part of the Key3 value? – Tim Mylott Oct 31 '19 at 21:32

2 Answers2

1

The following code demonstrates one way of splitting the string into separate lines and reassembling the lines that contain a colon. It maintains the order of the input lines and does not require any recent features, e.g String_Split.

-- Sample data.
declare @Newline as VarChar(2) = Char(10) + Char(13);
declare @Sample as VarChar(1024) =
  'Key:Value' + @Newline +
  'Key2:Value2' + @Newline +
  'Key3:Value3' + @Newline +
  @Newline +
  'i have' + @Newline +
  'a very ' + @Newline +
  'long ' + @Newline +
  'uncommom' + @Newline +
  'text which i need to drop' + @Newline +
  @Newline +
  'Key4:Valu4';

-- Display the sample data.
select @Sample as Sample, Replace( @Sample, @Newline, '¶' ) as VisibleSample;

-- Test the splitter.
select ItemNumber, Item, Replace( Item, @Newline, '¶' ) as VisibleItem
  from dbo.DelimitedSplit8K( @Sample, @Newline )
  where Item like '%:%';

-- Reassemble the items that include a colon.
declare @Result as NVarChar(1024) = Stuff(
  ( select @Newline + Item
      from dbo.DelimitedSplit8K( @Sample, @Newline )
      where Item like '%:%'
      order by ItemNumber for XML path(''), type).value('.[1]', 'VarChar(max)' ),
    1, Len( @NewLine ), '' );
select @Result as Result, Replace( @Result, @Newline, '¶' ) as VisibleResult;

The splitter is a modified version of Jeff Moden's code which handles a variable length delimiter:

CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter VARCHAR(16))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
     -- enough to cover VARCHAR(8000)
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
                 SELECT 1 UNION ALL
                 SELECT t.N+ Len( @pDelimiter ) FROM cteTally t WHERE SUBSTRING(@pString,t.N, Len( @pDelimiter ) ) = @pDelimiter
                ),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
                 SELECT s.N1,
                        ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1 ,8000)
                   FROM cteStart s
                )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
        Item       = SUBSTRING(@pString, l.N1, l.L1)
   FROM cteLen l
;
HABO
  • 15,314
  • 5
  • 39
  • 57
  • This one is really handy but i can't use it: Its a good use. But not every line i want to keep has ":" inside and your sample would only keep that lines. Lets say my key2 value2 par doesnt have ":" between them. – Programer Anel Nov 02 '19 at 18:08
  • @ProgramerAnel Then what is your question? `'a very'` doesn't contain a colon, yet you somehow intuit that it should be discarded. Does the name/value pair always occur on a single line, i.e. the code is applicable but the "discard" logic needs to be enhanced? Could `'shoe width EEE'` be a name/value pair? – HABO Nov 02 '19 at 20:49
0

This should work:

declare @string nvarchar(max) = 
'Key:Value
Key2:Value2
Key3:Value3
i have a very 
long uncommom text which i need to drop

Key4:Valu4'

DECLARE @Str NVARCHAR(MAX);
DECLARE @Result NVARCHAR(MAX)='';

--This is just for test the cursor query
SELECT REPLACE(REPLACE(value, CHAR(13), ''), CHAR(10), '') 
FROM STRING_SPLIT(@string, CHAR(13))
WHERE CHARINDEX(':', value) != 0

DECLARE db_cursor CURSOR FOR 
    SELECT REPLACE(REPLACE(value, CHAR(13), ''), CHAR(10), '') 
    FROM STRING_SPLIT(@string, CHAR(13))
    WHERE CHARINDEX(':', value) != 0

OPEN db_cursor  
FETCH NEXT FROM db_cursor INTO @Str  

WHILE @@FETCH_STATUS = 0  
BEGIN  
      SET @Result += @Str + CHAR(13) + CHAR(10); 

      FETCH NEXT FROM db_cursor INTO @Str 
END 

CLOSE db_cursor  
DEALLOCATE db_cursor

SELECT @Result

The Query uses the "STRING_SPLIT" function to reach to each row individually and then decide to keep the row or not based on existing of the ":" character. In the end the cursor loop creates attach all the valid rows (Key/Value).

  • From [`String_Split`](https://learn.microsoft.com/en-us/sql/t-sql/functions/string-split-transact-sql?view=sql-server-ver15&viewFallbackFrom=sql-server-2014): "The output rows might be in any order. The order is not guaranteed to match the order of the substrings in the input string." That may or may not matter to the OP. And it requires at least SQL Server 2016. – HABO Nov 01 '19 at 02:38