3

I have a column on a SQL Server table that has a lengthy value with line breaks. I am trying to find the values before each line break.

EXAMPLE:

--Column name: ItemDescription

Case Qty: 12
Weight: 8 oz.
Flavor code: PB
Size: STOCK
Cut: 1/8" x 1/8" x 3/16"
Additions: Bells
Cover Brine #: P1
Kosher Cert: OU
Organic Cert: 

This is EXACTLY what the I get when I copy the cell on my results and paste it. So I converted this field to VARBINARY and saw what ASCII codes are in here. Here is a part of ASCII interpretation of a value:

43 61 73 65 20 51 74 79 3A 20 31 32 0D0A 57 65 69 67 68 74 3A 20 38 20 6F 7A 2E 0D0A 46

0D0A meaning Carriage return and Line feed.

PREFERRED OUTCOME:

Now that the data is clear, I am trying to find the value after the colon and before the line break and put it in a new column.

So here is what my preferred outcome should look like:

enter image description here

WHAT I'VE TRIED:

Here is my current SQL Query to do this:

DECLARE @firstLine int
DECLARE @secondLine int
DECLARE @thirdLine int
DECLARE @fourthLine int
DECLARE @fifthLine int
DECLARE @sixthLine int
DECLARE @seventhLine int
DECLARE @eighthLine int


DECLARE @firstColon int
DECLARE @secondColon int
DECLARE @thirdColon int
DECLARE @fourthColon int
DECLARE @fifthColon int
DECLARE @sixthColon int
DECLARE @seventhColon int
DECLARE @eighthColon int
DECLARE @ninethColon int
DECLARE @itemDesc varchar(MAX)

SELECT 
    @itemDesc = ItemDescription
    ,@firstLine = CHARINDEX(CHAR(13), ItemDescription, 1)
    ,@secondLine = CHARINDEX(CHAR(13), ItemDescription, @firstLine + 1)
    ,@thirdLine =  CHARINDEX(CHAR(13), ItemDescription, @secondLine + 1)
    ,@fourthLine =  CHARINDEX(CHAR(13), ItemDescription, @thirdLine + 1)
    ,@fifthLine =  CHARINDEX(CHAR(13), ItemDescription, @fourthLine+ 1)
    ,@sixthLine =  CHARINDEX(CHAR(13), ItemDescription, @fifthLine + 1)
    ,@seventhLine =  CHARINDEX(CHAR(13), ItemDescription, @sixthLine + 1)
    ,@eighthLine = CHARINDEX(CHAR(13), ItemDescription, @seventhLine + 1)

    ,@firstColon = CHARINDEX(CHAR(58), ItemDescription, 1)--aaa
    ,@secondColon = CHARINDEX(CHAR(58), ItemDescription, @firstLine + 1)
    ,@thirdColon = CHARINDEX(CHAR(58), ItemDescription, @secondLine + 1)
    ,@fourthColon = CHARINDEX(CHAR(58), ItemDescription, @thirdLine + 1)
    ,@fifthColon = CHARINDEX(CHAR(58), ItemDescription, @fourthLine + 1)
    ,@sixthColon = CHARINDEX(CHAR(58), ItemDescription, @fifthLine + 1)
    ,@seventhColon = CHARINDEX(CHAR(58), ItemDescription, @sixthLine + 1)
    ,@eighthColon = CHARINDEX(CHAR(58), ItemDescription, @seventhLine + 1)
    ,@ninethColon = CHARINDEX(CHAR(58), ItemDescription, @eighthLine + 1)
FROM TableName

SELECT 
    ItemDescription
    ,CONVERT(VarBInary, itemDescription)
    ,LTRIM(SUBSTRING(ItemDescription, @firstColon + 2, @firstLine - (@firstColon - 1))) as caseQty          --1
    ,LTRIM(SUBSTRING(ItemDescription, @secondColon + 2, @secondLine - (@secondColon - 1))) as caseQty           --2
    ,LTRIM(SUBSTRING(ItemDescription, @thirdColon + 2, @thirdLine - (@thirdColon - 1))) as FlavorCode
    ,LTRIM(SUBSTRING(ItemDescription, @fourthColon + 2, @fourthLine - (@fourthColon - 1))) as Size
    ,LTRIM(SUBSTRING(ItemDescription, @fifthColon + 2, @fifthLine - (@fifthColon - 1))) as Cut
    ,LTRIM(SUBSTRING(ItemDescription, @sixthColon + 2, @sixthLine - (@sixthColon - 1))) as Additions
    ,LTRIM(SUBSTRING(ItemDescription, @eighthColon + 2, @eighthLine- (@eighthColon - 1))) as Brine
FROM
    TableName

THE ISSUE:

For some reason, the SUBSTRING isn't getting the right substring! I get the Qty and Weight correctly. But if size is RELISHSTOCK, I get RELISHSTOC. For FlavorCode, I get ut: (substring of "Cut:"). For Cut, I get 8" x 3/. For Additions, I get Brin (Substring of "Cover Brine").

WHAT am I doing wrong!! I've tried multiple different combinations of the substring length. I know for a fact it is the length. But now that I realize all the starting positions after flavor code is screwed up, I am not too sure where the mistake is.

Any help would be greatly appreciated.

Thank you.

Crazy Cucumber
  • 479
  • 8
  • 36
  • 1
    T-SQL isn't meant for string manipulation, much less *parsing*. If you want to store complex data store them as XML or Json. It's a *lot* easier to parse data with a language like C# when loading/inserting it – Panagiotis Kanavos Dec 15 '17 at 14:24
  • @PanagiotisKanavos See, that is the thing. The data in this table is simple enough to be able to get out with just T-SQL. I am horrible at XML and don't even want to open that can of worms for myself. But this data isn't going anywhere. It is just going to a SSRS report. So I thought I can make it work like this. – Crazy Cucumber Dec 15 '17 at 14:28
  • Besides, if you already know that the data will have specific fields, why not create a table for it in the first place? Parsing strings with a regex in C# is very easy. Extract the key/value pairs and insert them as field values to the table – Panagiotis Kanavos Dec 15 '17 at 14:29
  • No, the data is not simple at all."going to a SSRS report" is definitely not "just". Reporting needs *cleaned* data. You have a blob that could be interpreted as data. You need to convert it to a table while loading – Panagiotis Kanavos Dec 15 '17 at 14:30
  • 3
    The problem is your data violates 1NF by storing multiple pieces of data in a single tuple. Yours goes even further by storing multiple rows of multiple data points in a single tuple. You have an EAV type pattern stored in a single row. This is a nightmare to deal with sql. – Sean Lange Dec 15 '17 at 14:37
  • 1
    One possible fix (not a solution, the real solution is to change the way to load data and insert it directly into the ) is to convert it to JSON, then use SQL Server's JSON functions to extract attributes. For example, replacing each newline with `','`, each `:` with `' : '` and wrapping everything with `'{'` and `'}` would make it look like JSON. You could then use [JSON_QUERY](https://learn.microsoft.com/en-us/sql/t-sql/functions/json-query-transact-sql) to read the properties. This will be *slow* though. Fix the data – Panagiotis Kanavos Dec 15 '17 at 14:38
  • Thanks for your suggestions @PanagiotisKanavos. I will try that out. Also, the query does not load the report, the query will be used to create a new SQL table. And that SQL table is going to be in the report. But I get what you're saying. – Crazy Cucumber Dec 15 '17 at 14:48
  • @SeanLange I did not think of that. You're right. I will try to go with regex – Crazy Cucumber Dec 15 '17 at 14:49

1 Answers1

2

After about a week of struggling with this problem, and a week of trying to move my execution idea to the suggestions mentioned in the comments, I ended up finding a solution to this problem.

Step 1:
I found out that it was easier to manipulate a special character like ! than it was for line break and line feed (Char(10) and Char(13)).
So the first thing I did was to replace all the line break characters to !. Like so:

REPLACE(REPLACE(columnName, char(10), ''), char(13), '!')

This makes sure that the ! always has a space before it, making it easier to deal with.

Step 2:
I found an answer on SO that had a table-value function to split a string. The function name is fnSplitString and it takes two parameters: @string nvarchar(MAX) and @delimited char(1).

ALTER FUNCTION [dbo].[fnSplitString] 
( 
    @string NVARCHAR(MAX), 
    @delimiter CHAR(1) 
) 
RETURNS @output TABLE(columnName NVARCHAR(MAX), outValue NVARCHAR(MAX)) 
BEGIN 
    DECLARE 
        @start INT
        ,@end INT
        ,@colon INT

    SELECT 
        @start = 1
        ,@colon = CHARINDEX(@delimiter, @string)
        ,@end = CHARINDEX(CHAR(33), @string)
    WHILE @start < LEN(@string) + 1 BEGIN 
        IF(SUBSTRING(@string, @colon + 1, 1) = ' ')
        BEGIN
            INSERT INTO 
                @output (columnName, outValue)  
            VALUES
                ((SUBSTRING(@string, @start, @colon - @start)) ,(SUBSTRING(@string, @colon + 2,  @end - @colon - 2)))
            SET @start = @end + 1 
            SET @colon = CHARINDEX(@delimiter, @string, @start)
            SET @end = CHARINDEX(CHAR(33), @string, @start)
        END
        ELSE
        BEGIN
            INSERT INTO 
                @output (columnName, outValue)  
            VALUES
                ((SUBSTRING(@string, @start, @colon - @start)) ,(SUBSTRING(@string, @colon + 1,  @end - @colon - 1)))
            SET @start = @end + 1 
            SET @colon = CHARINDEX(@delimiter, @string, @start)
            SET @end = CHARINDEX(CHAR(33), @string, @start)
        END
    END 
    RETURN 
END

I had to change a lot to the function because this function was originally meant for getting values AFTER a delimiter up to the end of the string, which is pretty straightforward. But mine had to go from the delimiter to the !, which is char(33).

Step 3:
I wrote a stored procedure that handles all these operations. I already had a pretty lengthy stored procedure in place for my requirement and I had to add the results of this function joined with the results of a select statement to a new CTE and use that in the final result set.

For anyone that is curious, found in another SO question that the join of a table and a function can be done like so:

SELECT
    table1.columnName1
    ,function1.columnName2
    ,function1.columnName3
FROM
    tableName1 table1
CROSS APPLY
    dbo.functionName1(table1.columnName2, ':') function1

That did it.

I hope this helps someone in the future.

Crazy Cucumber
  • 479
  • 8
  • 36