SQL Parse out multiple substrings

Question

I have a very long and complex string that comes in with New line breaks - I am having a hard time parsing. I need to be able to create a select query with a column for each of the below fields.

The ideal would be find the new line break - for each line - go back to the : everything before the colon should be the name of the column, and everything between : and new ling break should be the data in the field.

All the data is returned as strings so i'm just building a select statement for each of the below lines. I'm not sure if that is at all possible.

The second alternative, to hard code and say something like CHARINDEX ( 'Home Phone:' ,notes, 0) Where I find the home phone string and then pull everything between the : and the new ling break after specifying the string.

In this case each select item in my query will say - find string "Home Phone" and pull what comes after the colon, or find string "School Name" etc.

This is what the data looks like (in one all string called notes):

Home Phone: 1234567890  
Cell Phone: 1234567890  
Date of Birth: 01/01/1971 
School Name: James Jones High  School 
Address:123 Main Street 
School City: Queens  
School State: PA  
School Zip: 32112 
Years Teaching: 12  
Grade Levels: Middle School  
Total Students: 120  
Subject: Music:   
How did they hear:  Other, provide more info: Former partner teacher in the Middle School 
Type: Public/Charter   
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa  aaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaa aaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaa aaaaaaa aaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaa aaaaa aaaaaa aaaaaa aaaaaaaaaaaa aaaaaaaaaaaa aaa aaaa aaaaa aaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaa aaaaaaaaa aaaaaaaaaaaa.   
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbb bbbbbbbbb bbbbbbb bbbbbb bbbbbb bbbbbbb  bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbb 
Question 3: ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccccccccccccccccc cccccccccccccccccc ccccccccccc ccccccccccccc ccccccccccccccccc cccccccc

so the output looks something like this (with all of the long questions answered as well in each field).

Home Phone  Cell Phone  Date of Birth:  …   Type:               Question 1 :                Question 2:    Question 3: 
1234567890  1234567890  1/1/1971            Public/Charter      aaaaaaaa aaaaaaaaaaaaa.     bbb bbbbbbbbbb ccccccccccccccccccccccc

I'm not sure if that makes sense -- but any and all suggestions are really appreciated.

Code to pull the substring and the new line char -- but this is hard coded. I can't figure out how to do it dynamically.

SELECT  ltrim(rtrim(CHARINDEX ( 'Home Phone:' ,notes, 0) + LEN('Home Phone: '))) as 'beggining',
        ltrim(rtrim(CHARINDEX ( CHAR(10) ,notes, 0)))   as 'ending',
        SUBSTRING(notes,(CHARINDEX ( 'Home Phone:' ,notes, 0) + LEN('Home Phone: ')),(LEN('Home Phone: '))) as 'home phone',    
FROM    table a

Thank you!

Is the problem that you don't know how to identify a newline character in a sql string? — Tab Alleman, Feb 29 '16 at 19:30
Possible duplicate of [Finding a line break in a text string in a SQL table?](http://stackoverflow.com/questions/13075079/finding-a-line-break-in-a-text-string-in-a-sql-table) — Tab Alleman, Feb 29 '16 at 19:46
I can find the starting position `select ltrim(rtrim(CHARINDEX ( 'Home Phone:' ,notes, 0) + LEN('Home Phone: ')))`, I'm not sure how I print out what is from this point to the new line break. — Elizabeth, Feb 29 '16 at 19:49
Do you have a fixed number of rows of data or can it vary? I am thinking you may need to use a string splitter to break this first into rows, then again into columns. — Sean Lange, Feb 29 '16 at 19:56
don't do it in sql. text processing in sql is hideiously painful/ugly. do this client-side, and saved the parsed data into a properly normalized data. — Marc B, Feb 29 '16 at 19:59
@SeanLange well for the year its fixed, next year it may change. so if there is a way to make it dynamic then that would be ideal. otherwise we can hard coded each of the rows and change them season by season. — Elizabeth, Feb 29 '16 at 20:14
So then the answer is that it needs to be dynamic. When the number of datapoints might change it requires a dynamic solution so you don't have to change the code over and over. That is kind of the point of writing code, you shouldn't have to change it all the time. — Sean Lange, Feb 29 '16 at 20:19
If you can identify the starting position, and you've looked at the duplicate and know how to find a newline character, then all you need is SUBSTRING(). — Tab Alleman, Feb 29 '16 at 20:19
@SeanLange yes, that is the ideal situation and what i'm hoping to accomplish. But I can find string lengths and get sub-strings. But I have no idea how to make it dynamic. — Elizabeth, Feb 29 '16 at 20:21
@TabAlleman - with hard coding I can, stating - to find the length various strings and using charindex - I can find a starting position. but i dont know how to find an in between zone. (like ending position). and even more so I don't know how to find it dynamically without specifying names - "home phone" ... any suggestions are all appreciated. — Elizabeth, Feb 29 '16 at 20:23
@SeanLange first comment was the best suggestion. Use a split function to split on NewLine character. Then use another to split each line on the colon character. Done. — Tab Alleman, Feb 29 '16 at 20:50
Here are some great examples of some string splitters. http://sqlperformance.com/2012/07/t-sql-queries/split-strings Basically you will need to split first on char(10). Then split each value on ':' to get the column and the values. — Sean Lange, Feb 29 '16 at 20:59

score 2 · Accepted Answer · edited May 23 '17 at 12:23

A lot of this credit (90%) should go to Alex K who provided an indepth answer regarding finding the nth occurrence of a character

SQL Server - find nth occurrence in a string

I took that answer, adjusted it for your question and then applied a PIVOT to break it into the desired rows/columns. This method should be able to create the desired output for as many unique question sets as you need provided they always have the same logic (each question/answer separated by a line break).

--Creates temporary table for testing, ID column and second set of data
--used to ensure query works for each unique set of questions
IF OBJECT_ID('tempdb..#Results') IS NOT NULL
    DROP TABLE #Results

CREATE TABLE #Results 
    (ID INT IDENTITY(1,1) NOT NULL,
    Notes NVARCHAR(4000) NOT NULL)
INSERT INTO #Results
    (Notes)
VALUES
    ('Home Phone: 1234567890  
    Cell Phone: 1234567890  
    Date of Birth: 01/01/1971 
    School Name: James Jones High  School 
    Address:123 Main Street 
    School City: Queens  
    School State: PA  
    School Zip: 32112 
    Years Teaching: 12  
    Grade Levels: Middle School  
    Total Students: 120  
    Subject: Music:   
    How did they hear:  Other, provide more info: Former partner teacher in the Middle School 
    Type: Public/Charter '),
    ('Home Phone: test  
    Cell Phone: test 
    Date of Birth: test
    School Name: test
    Address:test 
    School City: test 
    School State: test  
    School Zip: test 
    Years Teaching: test 
    Grade Levels: test 
    Total Students: test
    Subject: test   
    How did they hear:  test 
    Type: test ');

--Recursive CTE to determine the position of each successive line break
--Used CHARINDEX to search CHAR(13) and CHAR(10) and find line breaks and carriage returns
WITH cte
AS

    (SELECT ID, Notes, 1 AS Starts, CHARINDEX(CHAR(13)+CHAR(10),Notes) AS Pos
    FROM #Results
    UNION ALL
    SELECT ID, Notes, Pos +1, CHARINDEX(CHAR(13)+CHAR(10),Notes,Pos+1) AS Pos
    FROM cte
    WHERE
        pos >0),

--2nd CTE breaks each question set into it's own row
cte2
AS
    (SELECT ID, Notes,Starts, Pos,
        SUBSTRING(Notes, Starts,
            CASE
                WHEN pos > 0 THEN (pos - starts)
                ELSE LEN(notes)
            END) AS Token
    FROM cte),

--3rd CTE cleans up the data, separating the Questions/Answers into separate columns
--REPLACE is used to remove Line Break (CHAR(10)), output was then showing a TAB so used
--double REPLACE and removed CHAR(9) (tab)
--LTRIM removes leading space
cte3
AS
    (SELECT ID, 
        LTRIM(REPLACE(REPLACE(SUBSTRING(Token,CHARINDEX(CHAR(13)+CHAR(10),Token),CHARINDEX(':',Token)),CHAR(10),''),CHAR(9),'')) AS Question, 
        LTRIM(SUBSTRING(Token,CHARINDEX(':',Token)+1,4000)) AS Answer
    FROM cte2)

--Pivot separates each Question/Answer row into it's own column
SELECT *
FROM
    (SELECT ID, Question, Answer
    FROM cte3) AS a
PIVOT
    (MAX(Answer)
    FOR [Question] IN([Address],[Cell Phone],[Date of Birth],[Grade Levels],[Home Phone],[How did they hear],
                        [School City],[School Name],[School State],[School Zip],[Subject],[Total Students],[Type],[Years Teaching])) AS pvt

I put comments on each section to hopefully explain my logic but let me know if you have any questions.

EDIT: Dynamic Pivot

It is possible to use dynamic SQL to create a PIVOT that will automatically pick up on all the "Question" columns and adjust accordingly. I do not believe it can be done in one step since I had to use the multiple CTEs. What I would do is take the above steps used to create CTE, CTE2, and CTE3 (basically everything before the PIVOT query) and CREATE a VIEW of those steps, then with that view do the following (for my example the view is called "Questionaire")

DECLARE @columns AS NVARCHAR(MAX)
DECLARE @query AS NVARCHAR(MAX)

SET @columns =  STUFF((SELECT DISTINCT ',' + QUOTENAME(q.question)
        FROM questionaire AS q
        FOR XML PATH(''), TYPE
        ).value('.','NVARCHAR(MAX)')
        ,1,1,'')

SET @query =    'SELECT ID, '+ @columns +' FROM
        (
            SELECT ID, Answer, Question
            FROM questionaire
        ) AS a
        PIVOT
        (
            MAX(Answer)
            FOR Question IN(' +@columns+')
        ) AS p'
EXECUTE(@query)

This is working great -- I wanted to ask, I added/modified the logic a bit. at the end I created a temp table `SELECT distinct ('[' +Question +']') as 'Question'` into which I will place the column headers and i'm trying to add that to my final pivot. In the pivot For statement I have `FOR [Question] IN( Select Question from #temp )) AS p` and it seems to not be happy. — Elizabeth, Mar 01 '16 at 17:52
Yeah, PIVOT doesn't allow you to declare the columns using a SELECT statement like that. It is possible using Dynamic SQL, I edited my answer to provide a query for how I would achieve that. — Jericho, Mar 02 '16 at 04:25
My view has to have all three cte's? `CREATE VIEW questionaire AS WITH cte1 AS (...), cte2 AS (...), cte3 AS (...) SELECT ...` — Elizabeth, Mar 02 '16 at 18:04
Also one other question - is it detrimental that instead of a view I create a temp table (#questionable) and drop the contents of cte3 into it and pull the query from there? — Elizabeth, Mar 02 '16 at 18:41
The temp table should work fine - I really only use temp tables when answering Stack Overflow questions so I didn't even think about it. :) — Jericho, Mar 02 '16 at 21:42
Great this works perfectly (and then we realized the only way we can build the report was if we know the columns in advance) so we will keep this logic for internal use, and then stop it at the point -- before the pivot table, and send the non pivoted data to the visual studio SSRS package to pivot it in the system. (so we can make it dynamic that way). Thank you again for all your help! — Elizabeth, Mar 02 '16 at 21:56

score 0 · Answer 2 · answered Feb 29 '16 at 21:24

I know that many people around here don't like this splitter but it is the one I prefer. It can only handle input values up to 8000 and the delimiter is only a single character. However, it has some nice things some of the other splitters don't and unless you have huge volumes of input it is plenty for nearly everything. You can find the code here. http://www.sqlservercentral.com/articles/Tally+Table/72993/ The comments (requires login) runs for many pages and has a very lengthy discussion this splitter.

Then other people prefer to use a pivot this kind of thing, I much prefer the crosstab (also known as conditional aggregation) as I find the syntax far less obtuse.

I took the liberty of modifying your sample data slightly. I changed the value of cell phone so it wasn't the same as home phone. I also shortened the responses for the questions as they didn't need to be hundreds of characters to demonstrate the technique.

declare @SomeValue varchar(8000)

set @SomeValue = 'Home Phone: 1234567890  
Cell Phone: 3344556677
Date of Birth: 01/01/1971 
School Name: James Jones High  School 
Address:123 Main Street 
School City: Queens  
School State: PA  
School Zip: 32112 
Years Teaching: 12  
Grade Levels: Middle School  
Total Students: 120  
Subject: Music:   
How did they hear:  Other, provide more info: Former partner teacher in the Middle School 
Type: Public/Charter   
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa.
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb
Question 3: ccccccccccccccccccccccc cccccccc';

select 
    MAX(case when s.ItemNumber = 1 then x.Item end) as HomePhone
    , MAX(case when s.ItemNumber = 2 then x.Item end) as DOB
    , MAX(case when s.ItemNumber = 3 then x.Item end) as DOB
    , MAX(case when s.ItemNumber = 4 then x.Item end) as SchoolName
    , MAX(case when s.ItemNumber = 5 then x.Item end) as SchoolAddress
    , MAX(case when s.ItemNumber = 6 then x.Item end) as SchoolCity
    , MAX(case when s.ItemNumber = 7 then x.Item end) as SchoolState
    , MAX(case when s.ItemNumber = 8 then x.Item end) as SchoolZip
    , MAX(case when s.ItemNumber = 9 then x.Item end) as YearsTeaching
    , MAX(case when s.ItemNumber = 10 then x.Item end) as GradeLevels
    , MAX(case when s.ItemNumber = 11 then x.Item end) as TotalStudents
    , MAX(case when s.ItemNumber = 12 then x.Item end) as Subject
    , MAX(case when s.ItemNumber = 13 then x.Item end) as HowHeard
    , MAX(case when s.ItemNumber = 14 then x.Item end) as SchoolType
    , MAX(case when s.ItemNumber = 15 then x.Item end) as Question1
    , MAX(case when s.ItemNumber = 16 then x.Item end) as Question2
    , MAX(case when s.ItemNumber = 17 then x.Item end) as Question3
from dbo.DelimitedSplit8K(@SomeValue, CHAR(10)) s
cross apply dbo.DelimitedSplit8K(s.Item, ':') x

score 0 · Answer 3 · answered Mar 01 '16 at 05:46

You can try xml like this, but I get rid of extra : after music and provide more info.

DECLARE @string nvarchar(max) = '
Home Phone: 1234567890  
Cell Phone: 1234567890  
Date of Birth: 01/01/1971 
School Name: James Jones High  School 
Address:123 Main Street 
School City: Queens  
School State: PA  
School Zip: 32112 
Years Teaching: 12  
Grade Levels: Middle School  
Total Students: 120  
Subject: Music   
How did they hear:  Other, provide more info, Former partner teacher in the Middle School 
Type: Public/Charter   
Question 1: aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa  aaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaa aaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaaaaaaaa aaaaaaaa aaaaaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaa aaaaaa aaaaaa aaaaaaa aaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaa aaaaa aaaaaa aaaaaa aaaaaaaaaaaa aaaaaaaaaaaa aaa aaaa aaaaa aaaaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaa aaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaaa aaaaaaaaaaa aaaaaaaaa aaaaaaaaaaaa.   
Question 2: bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbb bbbbbbbbb bbbbbbb bbbbbb bbbbbb bbbbbbb  bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbbbb bbbbbbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbb 
Question 3: ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccccccccccccccccc cccccccccccccccccc ccccccccccc ccccccccccccc ccccccccccccccccc cccccccc'
,@xml as xml

SELECT @xml = REPLACE ('<mystring><fieldname id="'+REPLACE(REPLACE(right(@string,LEN(@string)-2),':','" >'),CHAR(10),'</fieldname><fieldname id="')+'</fieldname></mystring>' ,CHAR(13),'')

SELECT
    n.v.value('(fieldname[@id="Home Phone"])[1]','NVARCHAR(11)') AS 'Home Phone',
    n.v.value('(fieldname[@id="Cell Phone"])[1]','NVARCHAR(11)') AS 'Cell Phone',
    n.v.value('(fieldname[@id="Date of Birth"])[1]','NVARCHAR(12)') AS 'Date of Birth',
    n.v.value('(fieldname[@id="School Name"])[1]','NVARCHAR(30)') AS 'School Name',
    n.v.value('(fieldname[@id="Address"])[1]','NVARCHAR(30)') AS 'Address',
    n.v.value('(fieldname[@id="School City"])[1]','NVARCHAR(15)') AS 'School City',
    n.v.value('(fieldname[@id="School State"])[1]','NVARCHAR(10)') AS 'School State',
    n.v.value('(fieldname[@id="School Zip"])[1]','NVARCHAR(6)') AS 'School Zip',
    n.v.value('(fieldname[@id="Years Teaching"])[1]','NVARCHAR(5)') AS 'Years Teaching',
    n.v.value('(fieldname[@id="Grade Levels"])[1]','NVARCHAR(15)') AS 'Grade Levels',
    n.v.value('(fieldname[@id="Total Students"])[1]','NVARCHAR(5)') AS 'Total Students',
    n.v.value('(fieldname[@id="How did they hear"])[1]','NVARCHAR(100)') AS 'How did they hear',
    n.v.value('(fieldname[@id="Type"])[1]','NVARCHAR(25)') AS 'Type',
    n.v.value('(fieldname[@id="Question 1"])[1]','NVARCHAR(128)') AS 'Question 1',
    n.v.value('(fieldname[@id="Question 2"])[1]','NVARCHAR(128)') AS 'Question 2',
    n.v.value('(fieldname[@id="Question 3"])[1]','NVARCHAR(128)') AS 'Question 3'
FROM @xml.nodes('mystring') as n(v);

The result:

    Home Phone  Cell Phone  Date of Birth School Name                    Address                        School City     School State School Zip Years Teaching Grade Levels    Total Students How did they hear                                                                                    Type                      Question 1                                                                                                                       Question 2                                                                                                                       Question 3
----------- ----------- ------------- ------------------------------ ------------------------------ --------------- ------------ ---------- -------------- --------------- -------------- ---------------------------------------------------------------------------------------------------- ------------------------- -------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------
 1234567890  1234567890  01/01/1971    James Jones High  School      123 Main Street                 Queens          PA           32112      12             Middle School   120             Other, provide more info, Former partner teacher in the Middle School                               Public/Charter            aaaaaaaa aaaaaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaa aaaa aaa aaaaaaaa aaaaaa aaaaaaaa  aaaaaaaaaaaaaaaaa  bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb bbbbbbbbbbbb bbbbbbb bbb bbbbbbbbbb bbbbbbbbbbbbbbbbb bbbbbbbbbbbbbbbbbbb   ccccccccccccccccccccccc cccccccc ccccccccccc cccccccccccccccccccccc ccc ccccccccc cccccccccccccc ccccccccccccccccccccc cccccccc

(1 row(s) affected)

SQL Parse out multiple substrings

3 Answers3

Linked