1

In SQL Server I have a field that has delimited data (by space) in it.

E.g.

recid| Delimited data field
1| 1 2 3 4 5
2| 1 2 3 3 5
3| 1 1 1 1 1

I need to loop through all the records in the DB and interrogate the delimited data field and compare the third and fourth parts of data against each other and if they match, return the recid and the whole delimited field.

So from my example records 2 and 3 have matching data parts, so it would return:-

2|1 2 3 3 5
3|1 1 1 1 1

Because 3 3 matches, as does 1 1.

Thanks.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
Ian Thompson
  • 187
  • 1
  • 11
  • 8
    Fix your data model! You should not be storing numbers as strings. Nor should you be storing multiple values in a single column. I would put the effort into fixing the data rather than trying to figure out this one problem. – Gordon Linoff Feb 11 '19 at 12:39
  • 2
    Is your data always in 5 parts? Could it be less/more? What have you tried? *(Gordan is right though, you should be fixing your structure, if you can do that, the volunteers here will be happy to show you how.)* – Thom A Feb 11 '19 at 12:39
  • 1
    Tables have _columns_, not fields. – jarlh Feb 11 '19 at 12:40
  • 1
    And they're *rows* not records. – Thom A Feb 11 '19 at 12:52

6 Answers6

2

If it is always 1 digit and same format, you can try like following.

select * from @table
where SUBSTRING([data], 5, 1) = SUBSTRING([data], 7, 1)

If not (Numbers are not single digit), you can try like following.

;WITH cte 
     AS (SELECT F1.recid, 
                F1.[data], 
                O.splitdata, 
                Row_number() 
                  OVER( 
                    partition BY recid 
                    ORDER BY (SELECT 1)) rn 
         FROM   (SELECT *, 
                        Cast('<X>' + Replace(F.data, ' ', '</X><X>') + '</X>' AS 
                             XML) 
                        AS 
                                xmlfilter 
                 FROM   @table F)F1 
                CROSS apply (SELECT fdata.d.value('.', 'varchar(50)') AS 
                                    splitdata 
                             FROM   f1.xmlfilter.nodes('X') AS fdata(d)) O) 
SELECT c1.recid, 
       c1.data 
FROM   cte c1 
       INNER JOIN cte c2 
               ON c1.recid = c2.recid 
                  AND c1.rn = 3 
                  AND c2.rn = 4 
                  AND c1.splitdata = c2.splitdata 
GROUP  BY c1.recid, 
          c1.data 

Online Demo

PSK
  • 17,547
  • 5
  • 32
  • 43
  • This **assumes** that the delimited data only has 1 character. That could be a costing assumption. – Thom A Feb 11 '19 at 12:51
  • @Larnu, i agree, that's why I mentioned in the answer. I will update for the different approach – PSK Feb 11 '19 at 12:52
  • I know, but I would suspect it very unlikely that the values are only going to be `0-9`. – Thom A Feb 11 '19 at 12:53
  • @Larnu, i have updated the answer for the other scenario also. – PSK Feb 11 '19 at 13:09
  • Excellent thank you very much @PSK how would I get back the first 4 row numbers. e.g 1 2 3 3 1 1 1 1 ? – Ian Thompson Feb 11 '19 at 16:28
  • I am sorry I am not getting your question, can you please explain it in more detail. – PSK Feb 11 '19 at 16:38
  • so if I had '1 22 33 33 44' and that matches on '33 33', how would I select just the first 3 parts of the column, e.g. '1 22 33' ? – Ian Thompson Feb 11 '19 at 16:56
  • can you check this https://rextester.com/KMX12033 you can also upvote the answer if you like :) – PSK Feb 11 '19 at 17:06
0

Need to split the data, give the row number and then compare.

Schema:

SELECT *  INTO #TAB FROM (
SELECT 1, '1 2 3 4 5' UNION ALL
SELECT 2, '1 2 3 3 5' UNION ALL
SELECT 3, '1 1 1 1 1'  
)A (recid , Delimited_data_field)

Solution :

;WITH CTE
AS (
    SELECT recid
        ,Delimited_data_field
        ,ROW_NUMBER() OVER (PARTITION BY recid ORDER BY (SELECT 1)) RNO
        ,splt.X.value('.', 'INT') VAL
    FROM (
        SELECT recid
            ,Delimited_data_field
            ,CAST('<M>' + REPLACE(Delimited_data_field, ' ', '</M><M>') + '</M>' AS XML) DATA
        FROM #TAB
        ) A
    CROSS APPLY A.DATA.nodes('/M') splt(x)
    )
SELECT C.recid
    ,C2.Delimited_data_field
FROM CTE C
INNER JOIN CTE C2 ON C.recid = C2.recid AND C.RNO = 3 AND C2.RNO = 4
AND C.VAL = C2.VAL 

Result :

recid   Delimited_data_field
2       1 2 3 3 5
3       1 1 1 1 1
Shakeer Mirza
  • 5,054
  • 2
  • 18
  • 41
0

Your question has two parts, find nth split and then compare. Your first approach should be to break the problem until you find built in functions that can do the job. here is one method inner query return result after split and outer compares:

SELECT recid,Delimited from (
        SELECT recid,Delimited, SUBSTRING(Delimited, 
              charindex(' ', Delimited, (charindex(' ', Delimited, 1))+2)+1,1) 
              third, SUBSTRING(Delimited, charindex(' ',Delimited, 
              (charindex(' ', Delimited, 1))+3)+1,1) 
              fourth FROM YourTable) tr
        WHERE third = fourth

See simple substring and charindex can do the job.

abdul qayyum
  • 535
  • 1
  • 17
  • 39
0

Here is one more solution to that.

I tweaked the split function in this link (T-SQL: Opposite to string concatenation - how to split string into multiple records) a bit to make it usefule in your scenario.

Here is the function.

CREATE FUNCTION dbo.SplitAndGetNumberAt (@sep char(1), @s varchar(512), @pos int)
RETURNS INT
BEGIN
declare @val as varchar(10);

WITH Pieces(pn, start, stop) AS (
    SELECT 1, 1, CHARINDEX(@sep, @s)
    UNION ALL
    SELECT pn + 1, stop + 1, CHARINDEX(@sep, @s, stop + 1)
    FROM Pieces
    WHERE stop > 0
)
SELECT @val = SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END)
FROM Pieces where pn = @pos;

RETURN @val
END

Now you can use this function to get 3rd and 4th position of numbers and compare easily.

select recid, deldata
from so1
where dbo.SplitAndGetNumberAt (' ', deldata, 3) = dbo.SplitAndGetNumberAt (' ', deldata, 4)

Hope it will help.

Roopesh
  • 279
  • 2
  • 7
0

If you have SQL Server 2016 or higher, you may try one approach using OPENJSON() to split your input data. The important part here is the fact, that when OPENJSON parses a JSON array the indexes of the elements in the JSON text are returned as keys (0-based).

Input:

CREATE TABLE #Table (
   RecId int,
   Data varchar(max)
)
INSERT INTO #Table
   (RecId, Data)
VALUES 
   (1, '1 2 3 4 5'),
   (2, '1 2 3 3 5'),
   (3, '1 1 1 1 1')

Statement:

SELECT 
   t.RecId,
   t.Data
FROM #Table t
CROSS APPLY (SELECT [value] FROM OPENJSON('["' +  REPLACE(t.Data,' ','","') + '"]') WHERE [key] = 2) j3
CROSS APPLY (SELECT [value] FROM OPENJSON('["' +  REPLACE(t.Data,' ','","') + '"]') WHERE [key] = 3) j4
WHERE j3.[value] = j4.[value]

Output:

RecId   Data
2       1 2 3 3 5
3       1 1 1 1 1
Zhorov
  • 28,486
  • 6
  • 27
  • 52
0

Just for fun, sort of crazy coding:

DECLARE @Table Table (
    recid               INT,
    DelimitedDataField  VARCHAR(32)
)

INSERT @Table (recid, DelimitedDataField)
VALUES
    (1, '1 2 3 4 5'),
    (2, '1 2 3 3 5'),
    (3, '1 1 1 1 1')

SELECT *
FROM @Table
WHERE
SUBSTRING (
    STUFF(
        STUFF(
            DelimitedDataField + ' - - -',
            1,
            CHARINDEX(' ', DelimitedDataField + ' - - -'),
            ''
        ),
        1,
        CHARINDEX(' ', STUFF(
                        DelimitedDataField + ' - - -',
                        1,
                        CHARINDEX(' ', DelimitedDataField + ' - - -'), '')
                 ),
        ''),
    1,
    CHARINDEX(' ', STUFF(
        STUFF(
            DelimitedDataField + ' - - -',
            1,
            CHARINDEX(' ', DelimitedDataField + ' - - -'),
            ''
        ),
        1,
        CHARINDEX(' ', STUFF(
                        DelimitedDataField + ' - - -',
                        1,
                        CHARINDEX(' ', DelimitedDataField + ' - - -'), '')
                 ),
        '')
        )
) = 
SUBSTRING (
        STUFF(
            STUFF(
                STUFF(
                    DelimitedDataField + ' - - -',
                    1,
                    CHARINDEX(' ', DelimitedDataField + ' - - -'),
                    ''
                ),
                1,
                CHARINDEX(' ', STUFF(
                                DelimitedDataField + ' - - -',
                                1,
                                CHARINDEX(' ', DelimitedDataField + ' - - -'), '')
                         ),
                ''),
            1,
            CHARINDEX(' ', STUFF(
                STUFF(
                    DelimitedDataField + ' - - -',
                    1,
                    CHARINDEX(' ', DelimitedDataField + ' - - -'),
                    ''
                ),
                1,
                CHARINDEX(' ', STUFF(
                                DelimitedDataField + ' - - -',
                                1,
                                CHARINDEX(' ', DelimitedDataField + ' - - -'), '')
                         ),
                '')
            ),
            ''
        ),
        1,
        CHARINDEX(' ',      STUFF(
            STUFF(
                STUFF(
                    DelimitedDataField + ' - - -',
                    1,
                    CHARINDEX(' ', DelimitedDataField + ' - - -'),
                    ''
                ),
                1,
                CHARINDEX(' ', STUFF(
                                DelimitedDataField + ' - - -',
                                1,
                                CHARINDEX(' ', DelimitedDataField + ' - - -'), '')
                         ),
                ''),
            1,
            CHARINDEX(' ', STUFF(
                STUFF(
                    DelimitedDataField + ' - - -',
                    1,
                    CHARINDEX(' ', DelimitedDataField + ' - - -'),
                    ''
                ),
                1,
                CHARINDEX(' ', STUFF(
                                DelimitedDataField + ' - - -',
                                1,
                                CHARINDEX(' ', DelimitedDataField + ' - - -'), '')
                         ),
                '')
            ),
            ''
        ))
)

AND SUBSTRING (
    STUFF(
        STUFF(
            DelimitedDataField + ' - - -',
            1,
            CHARINDEX(' ', DelimitedDataField + ' - - -'),
            ''
        ),
        1,
        CHARINDEX(' ', STUFF(
                        DelimitedDataField + ' - - -',
                        1,
                        CHARINDEX(' ', DelimitedDataField + ' - - -'), '')
                 ),
        ''),
    1,
    CHARINDEX(' ', STUFF(
        STUFF(
            DelimitedDataField + ' - - -',
            1,
            CHARINDEX(' ', DelimitedDataField + ' - - -'),
            ''
        ),
        1,
        CHARINDEX(' ', STUFF(
                        DelimitedDataField + ' - - -',
                        1,
                        CHARINDEX(' ', DelimitedDataField + ' - - -'), '')
                 ),
        '')
        )
) <>'-'
Denis Rubashkin
  • 2,151
  • 1
  • 9
  • 14