Splitting sql pipe separated

Question

I have table as follows

Discese

ID | DisceseNAme
1  |  Heart
2  |  Lungs
3  |  ENT

Registration

PatienID |  NAME | Discease 
1        | abc   |  1 
2        |  asa  |  2|3
3        |  asd  |  1|2|3

I have a function to split |-separated data. Now I want result as:

PatientID | Name |   DisceseNAme
1         | abc  |   heart
2         |asa   |   Lungs,ENT
3         |asd   |   heart,Lungs,ENT

My split function is

ALTER FUNCTION [dbo].[fnSplit](
    @sInputList VARCHAR(8000) -- List of delimited items
  , @sDelimiter VARCHAR(8000) = '|' -- delimiter that separates items
) RETURNS @List TABLE (item VARCHAR(8000))

BEGIN
DECLARE @sItem VARCHAR(8000)
WHILE CHARINDEX(@sDelimiter,@sInputList,0) <> 0
 BEGIN
 SELECT
  @sItem=RTRIM(LTRIM(SUBSTRING(@sInputList,1,CHARINDEX(@sDelimiter,@sInputList,0)-1))),
  @sInputList=RTRIM(LTRIM(SUBSTRING(@sInputList,CHARINDEX(@sDelimiter,@sInputList,0)+LEN(@sDelimiter),LEN(@sInputList))))

 IF LEN(@sItem) > 0
  INSERT INTO @List SELECT @sItem
 END

IF LEN(@sInputList) > 0
 INSERT INTO @List SELECT @sInputList -- Put the last item in
RETURN
END

I am not sure how I can get that result, though.

My answer is that your should normalize the `Registration` table rather than using pipe-separated data. — Tim Biegeleisen, Mar 10 '16 at 06:49
This is an object lesson in why you use normalized data, not denormalized data, in your tables. If your data used one single number for each 'disease' (no c's in the word in English), and had 6 lines of data, then you wouldn't need the split function. Allowing 8000 characters for the field delimiter seems a bit excessive, too. Do you have a 'GROUP_CONCAT' aggregate available to you? — Jonathan Leffler, Mar 10 '16 at 06:50
@JonathanLeffler `GROUP_CONCAT` does not exist in SQL Server. The concensus is using `FOR XML PATH('')` for a `select ','+column ...` (without a column name), and `STUFF` to remove the leading comma. — TT., Mar 10 '16 at 06:53
@TT. — Thanks. I suspected as much. Is there another question which is applicable to SQL Server and covers the processing needed with FOR XML and STUFF? If so, we can perhaps close this as a duplicate of that. — Jonathan Leffler, Mar 10 '16 at 07:01
@JonathanLeffler I am going to answer with how to do this with normalized tables. IMO it's better to lead the OP down the right path of structuring tables. — TT., Mar 10 '16 at 07:10
@JonathanLeffler There are probably at least 50 duplicates of "how to concatenate multiple rows into a column" on here for SQL Server ;-). And for the record, a WHILE loop should _not_ be used to do a split. There are probably well over 100 duplicates on here for how to split a string in SQL Server. Finally, it might be possible to do something with INNER JOIN or CROSS APPLY to do an inline REPLACE of the `,1,` into the lookup value and skip the split + concat entirely (possibly). Might also help to spell-check "disease" :-). But yes, 100% agree about normalizing being much preferred. — Solomon Rutzky, Mar 10 '16 at 07:10
@srutzky: neither of your statistics surprises me — it's about what I'd expect. I'm just being lazy and hoping someone else will go find the most appropriate match. I don't "do" SQL Server as such, so I'm not sure I'd pick the best answer for MS SQL Server 2008 (the best answer might not be tagged with any version, or might be tagged with a different version but still be applicable). — Jonathan Leffler, Mar 10 '16 at 07:13
@JonathanLeffler You're not being lazy, you are being quite helpful. I tagged you because you mentioned GROUP_CONCAT and so I was adding to TT's reply to that. Sadly, the O.P. is the one being lazy. Given how many duplicates there are of both questions on here, clearly no "real" attempt at research was done. Still, I should mention that you can do GROUP_CONCAT via SQLCLR (.NET code in SQL Server). There is an open-source GROUP_CONCAT project somewhere, and I wrote a library, [SQL#](http://SQLsharp.com), that has a free version with **Agg_Join** that does this :-). — Solomon Rutzky, Mar 10 '16 at 07:21

score 2 · Accepted Answer · edited May 23 '17 at 12:23

As already mentioned in the comments, it is better to normalize your table structure. What this means is that you should not store patient's diseases in one VARCHAR column with disease ID's separated with some character. Instead you should store all diseases for a patient in separate rows.

If you keep using the setup you have now, your queries will become real cumbersome and performance will be really bad. Also, you will not be able to enjoy database consistency by using foreign keys.

I've written this example script which finally selects for the output you require. The example uses temporary tables. If you choose to use this way of working (and you should), just use this setup with regular tables (ie not starting with #).

The tables:

#disease: Defines diseases
#patients: Defines patients
#registration: Defines patients' diseases; foreign keys to #disease and #patients for data consistency (make sure the patients and diseases actually exist in the database)

If you're wondering how the FOR XML PATH('') construct in the final query results in a |-separated VARCHAR, read this answer I gave a while ago on this subject.

-- Diseases
CREATE TABLE #disease(
    ID INT,
    DiseaseName VARCHAR(256),
    CONSTRAINT PK_disease PRIMARY KEY(ID)
);
INSERT INTO #disease(ID,DiseaseName)VALUES
    (1,'Heart'),(2,'Lungs'),(3,'ENT');

-- Patients
CREATE TABLE #patients(
    PatientID INT,
    Name VARCHAR(256),
    CONSTRAINT PK_patients PRIMARY KEY(PatientID)
);
INSERT INTO #patients(PatientID,Name)VALUES
    (1,'abc'),(2,'asa'),(3,'asd'),(4,'zldkzld');

-- Registration for patient's diseases
CREATE TABLE #registration(
    PatientID INT,
    Disease INT,
    CONSTRAINT FK_registration_to_patient FOREIGN KEY(PatientID) REFERENCES #patients(PatientID),
    CONSTRAINT FK_registration_to_disease FOREIGN KEY(Disease) REFERENCES #disease(ID),
);
INSERT INTO #registration(PatientID,Disease)VALUES
    (1,1),             -- patient with ID 1 has one disease: Heart
    (2,2),(2,3),       -- patient with ID 2 has two diseases: Lungs and ENT
    (3,1),(3,2),(3,3); -- patient with ID 3 has three diseases: Heart, Lungs and ENT

-- Select diseases for partients in one |-separated column
SELECT
    p.PatientID,p.Name,Diseases=STUFF(dn.diseases,1,1,'')
FROM
    #patients AS p
    CROSS APPLY ( -- construct a |-separated column with all diseases for the client
        SELECT
            '|'+d.DiseaseName
        FROM
            #registration AS r
            INNER JOIN #disease AS d ON
                d.ID=r.Disease
        WHERE
            r.PatientID=p.PatientID
        FOR
            XML PATH('')
    ) AS dn(diseases)
WHERE
    EXISTS(SELECT 1 FROM #registration AS r WHERE r.PatientID=p.PatientID)
ORDER BY
    p.PatientID;

DROP TABLE #disease;DROP TABLE #registration;DROP TABLE #patients;

Results:

+-----------+------+-----------------+
| PatientID | Name |    Diseases     |
+-----------+------+-----------------+
|         1 | abc  | Heart           |
|         2 | asa  | Lungs|ENT       |
|         3 | asd  | Heart|Lungs|ENT |
+-----------+------+-----------------+

@Fattycoder It's good to hear you agree that normalizing the table structure is the right path =) — TT., Mar 10 '16 at 07:55

Splitting sql pipe separated

1 Answers1