0

I have a string column in my table that contains 'Character-separated' data such as this:

"Value|Data|4|Z|11/06/2012"

This data is fed into a 'parser' and deserialised into a particular object. (The details of this aren't relevant and can't be changed)

The structure of my object has changed and now I would like to get rid of some of the 'sections' of data

So I want the previous value to turn into this

"Value|Data|11/06/2012"

I was hoping I might be able to get some help on how I would go about doing this in T-SQL.

The data always has the same number of sections, 'n' and I will want to remove the same sections for all rows , 'n-x and 'n-y'

So far I know I need an update statement to update my column value. I've found various ways of splitting a string but I'm struggling to apply it to my scenario.

In C# I would do

string RemoveSecitons(string value)
{
    string[] bits = string.split(value,'|');

    List<string> wantedBits = new List<string>();


    for(var i = 0; i < bits.Length; i++)
    {
        if ( i==2 || i==3) // position of sections I no longer want
        {
            continue;
        }
        wantedBits.Add(bits[i]);
    }

    return string.Join(wantedBits,'|');

}

But how I would do this in SQL I'm not sure where to start. Any help here would be appreciated

Thanks

Ps. I need to run this SQL on SQL Server 2012

Edit: It looks like parsing to xml in some manner could be a popular answer here, however I can't guarantee my string won't have characters such as '<' or '&'

Dave
  • 2,829
  • 3
  • 17
  • 44
  • You are removing the values `4` and `Z`. Is there any sort of pattern which identifies the data you want to remove? – Tim Biegeleisen Jan 29 '19 at 16:24
  • In my actual scenario it is an integer value and a date, however the data contains other integer values – Dave Jan 29 '19 at 16:33
  • @Dave . . . Oh, this is such a good example of why you want to store data in columns and not strings. I suspect there is a somewhat elegant XML solution. – Gordon Linoff Jan 29 '19 at 16:39
  • 1
    Have a look at https://stackoverflow.com/questions/10914576/t-sql-split-string as a starting point. – NP3 Jan 29 '19 at 16:45

5 Answers5

3

Using NGrams8K you can easily write a nasty fast customized splitter. The logic here is based on DelimitedSplit8K. This will likely outperform even the C# code you posted.

DECLARE @string VARCHAR(8000) = '"Value|Data|4|Z|11/06/2012"',
        @delim  CHAR(1)       = '|';

SELECT newString = 
(
  SELECT SUBSTRING(
           @string, split.pos+1,
           ISNULL(NULLIF(CHARINDEX(@delim,@string,split.pos+1),0),8000)-split.pos)
  FROM
  (
    SELECT ROW_NUMBER() OVER (ORDER BY d.Pos), d.Pos
    FROM
    (
      SELECT 0 UNION ALL
      SELECT ng.position 
      FROM   samd.ngrams8k(@string,1) AS ng
      WHERE  ng.token = @delim
    ) AS d(Pos)
  ) AS split(ItemNumber,Pos)
  WHERE split.ItemNumber IN (1,2,5)
  ORDER BY split.ItemNumber
  FOR XML PATH('')
);

Returns:

newString
----------------------------
"Value|Data|11/06/2012"
Alan Burstein
  • 7,770
  • 1
  • 15
  • 18
2

You might try some XQuery:

DECLARE @s VARCHAR(100)='Value|Data|4|Z|11/06/2012';

SELECT CAST('<x>' + REPLACE(@s,'|','</x><x>') + '</x>' AS XML)
       .value('concat(/x[1],"|",/x[2],"|",/x[5])','nvarchar(max)');

In short: The value is trasformed to XML by some string replacements. Then we use the XQuery-concat to bind the first, the second and the fifth element together again.

This version is a bit less efficient but safe with forbidden characters:

SELECT CAST('<x>' + REPLACE((SELECT @s AS [*] FOR XML PATH('')),'|','</x><x>') + '</x>' AS XML)
       .value('concat(/x[1],"|",/x[2],"|",/x[5])','nvarchar(max)')
Shnugo
  • 66,100
  • 9
  • 53
  • 114
2

Just to add a non-xml option for fun:

Edit and Caveat - In case anyone tries this for a different solution and doesn't read the comments...

HABO rightly noted that this is easily broken if any of the columns have a period (".") in them. PARSENAME is dependent on a 4 part naming structure and will return NULL if that is exceeded. This solution will also break if any values ever contain another pipe ("|") or another delimited column is added - the substring in my answer is specifically there as a workaround for the dependency on the 4 part naming. If you are trying to use this solution on, say, a variable with 7 delimited columns, it would need to be reworked or scrapped in favor of one of the other answers here.

DECLARE 
    @a VARCHAR(100)= 'Value|Data|4|Z|11/06/2012'


SELECT 
    PARSENAME(REPLACE(SUBSTRING(@a,0,LEN(@a)-CHARINDEX('|',REVERSE(@a))+1),'|','.'),4)+'|'+
    PARSENAME(REPLACE(SUBSTRING(@a,0,LEN(@a)-CHARINDEX('|',REVERSE(@a))+1),'|','.'),3)+'|'+
    SUBSTRING(@a,LEN(@a)-CHARINDEX('|',REVERSE(@a))+2,LEN(@a))
GreyOrGray
  • 1,575
  • 8
  • 14
2

Not the most elegant way, but works:

SELECT SUBSTRING(@str,1, CHARINDEX('|',@str,CHARINDEX('|',@str,1)+1)-1)
   + SUBSTRING(@str, CHARINDEX('|',@str,CHARINDEX('|',@str,CHARINDEX('|',@str,CHARINDEX('|',@str,1)+1)+1)+1), LEN(@str))


----------------------
Value|Data|11/06/2012
0

Here is a quick way to do it.

CREATE FUNCTION [dbo].StringSplitXML
(
    @String  VARCHAR(MAX), @Separator CHAR(1)
)
RETURNS @RESULT TABLE(id int identity(1,1),Value VARCHAR(MAX))
AS
BEGIN    
 DECLARE @XML XML
 SET @XML = CAST(
    ('<i>' + REPLACE(@String, @Separator, '</i><i>') + '</i>')
    AS XML)

 INSERT INTO @RESULT
 SELECT t.i.value('.', 'VARCHAR(MAX)') 
 FROM @XML.nodes('i') AS t(i)
 WHERE t.i.value('.', 'VARCHAR(MAX)') <> ''

 RETURN
END
GO
SELECT * FROM dbo.StringSplitXML( 'Value|Data|4|Z|11/06/2012','|')
WHERE id not in (3,4)

Note that using a UDF will slow things down, so this solution should be considered only if you have a reasonably small data set to work with.

Sparky
  • 14,967
  • 2
  • 31
  • 45
  • hi thanks for taking the time out to help. I should (and I will edit my question to reflect this) note that I can't guarantee that the string won't contain invalid xml characters like '<' or '&' – Dave Jan 29 '19 at 16:53
  • 1
    If that is the case, there are lots of string split functions floating around that don't use XML. As long as the function you use returns a numeric position id and the value, you should be able to use the above concept, returning the table and filtering it with a WHERE clause. Check the link I just saw as a comment, I am sure there are some good string split functions there. – Sparky Jan 29 '19 at 16:57
  • `Note that using a UDF will slow things down, so this solution should be considered only if you have a reasonably small data set to work with.` The UDF will not slow things down when written correctly! T-SQL UDF's that include a BEGIN / END logic are horrible because they are not inline; Inline UDFs can be blazing fast. To improve your splitter function I would suggest: 1. Re-write it as an inline table valued function 2. Change the Return Type to VARCHAR(8000); MAX datatypes should only be used when absolutely necessary. PS - XML Splitters are slow even when inline. – Alan Burstein Jan 29 '19 at 23:55