T-SQL - How can I split a variable into 4 columns for output?

Question

This is a T-SQL question, I use Microsoft SQL Server 2014. I am joining three tables, which is very straightforward. The tricky part is this: one of the variables, SubtotalKey, takes the form ‘ABD_1999_MAE_1’. I would like to split this variable into its four components, delimited by the underscore, and include the four columns in my output at a specific point in the query. I have a workable solution which uses a scalar function. Workable in the sense that it does what I have just described…but there are performance issues which render it unusable. I’ve since converted the scalar function to a table-valued function, and am using ‘outer apply’ on it as a solution. Unfortunately, this results in 4 rows in the output per result row from the join. Not sure where to go from here - I tried pivot but pivot needs numerical columns to pivot, I think. All help much appreciated.

The table value function in the code below, ufn_SplitString, will split the string above into a table with 1 column and 4 rows. The 4 rows contain the values, respectively, ABD, 1999, MAE, 1.

For the purposes of this question, there are 4 elements in SubtotalKey, but in reality, the number will be variable. Is a solution possible, if I don’t know in advance what the number of required extra columns will be? Here is my code so far:

SELECT 
    t1.t_proj AS time_period,
    ufn.item,
    t3.AnnClaimVal AS annuity_outgo_smbel,
    t3.DeathClaimVal AS death_outgo_smbel,
    t2.SolvSurvXCF AS annuity_outgo_reins,
    t2.SolvDeathXCF AS death_outgo_reins,
    t2.ReinSwapXCF AS mortswap_fixedleg_payment,
    t3.ExpenseValXCF AS ren_exp,
    t1.InvExpSCF AS inv_exp,
    t1.InvExpReinSCF AS inv_exp_reins

    FROM [sch_ImmAnn].[viw_mdlEV_Formulae] t1
    INNER JOIN [sch_ImmAnn].[viw_mdl_Formulae] t2 
    ON t1.t_proj = t2.t_proj AND t1._SubtotalKey = t2._SubtotalKey AND t1._Scenario = t2._Scenario AND t1._ExecRun_UID = t2._ExecRun_UID 
    INNER JOIN [sch_ImmAnn].[viw_mdlValue_Formulae] t3 
    ON t1.t_proj = t3.t_proj AND t1._SubtotalKey = t3._SubtotalKey AND t1._Scenario = t3._Scenario AND t1._ExecRun_UID = t3._ExecRun_UID 
    OUTER APPLY sch_Common.ufn_SplitString(t1._SubtotalKey,'_') ufn
    WHERE t1._ExecRun_UID = @ExecUID AND t1._Scenario = @Scenario
    AND t1.t_proj >= 0 AND  t1.t_proj <= 650
    ORDER BY SubtotalKey, time_period

Here is some sample data for t1:

t_proj SubtotalKey   Scenario    ExecRun_UID InvExpSCF InvExpReinSCF
1      ABD_1999_MAE_1       1       36FA21C8 5334.44   37.88
2      EMM_E12_MAE_3        1       36FA21C8 1894.88   1298.3
3      XYZ_2008_MAE_1       1       36FA21C8 12.99     10009.33

Here is some sample data for t2:

t_proj SubtotalKey   Scenario    ExecRun_UID SolvSurvXCF   SolvDeathXCF ReinSwap    

1      ABD_1999_MAE_1       1       36FA21C8 543.88        12.33          1.2
2      EMM_E12_MAE_3        1       36FA21C8 2985.11       59.31          4.6
3      XYZ_2008_MAE_1       1       36FA21C8 309999.12     111.33         9.7

Here is some sample data for t3:

t_proj SubtotalKey   Scenario    ExecRun_UID ExpenseValXCF AnnClaimVal DeathClaimVal 
1      ABD_1999_MAE_1       1       36FA21C8 100           901         678
2      EMM_E12_MAE_3        1       36FA21C8 200           492         121
3      XYZ_2008_MAE_1       1       36FA21C8 554           510         144

Here is the desired output:

t_proj  Col1   Col2   Col3   Col4   Scenario   ExecRun_UID   InvExpSCF   InvExpReinSCF   SolvSurvXCF   SolvDeathXCF   ReinSwap   ExpenseValXCF  AnnClaimVal DeathClaimVal
1       ABD    1999   MAE    1      1          36FA21C8      5334.44     37.88           543.88         12.33          1.2       100             901         678
2       EMM    E12    MAE    1      1          36FA21C8      1894.88     1298.3          2985.11        59.31          4.6       200             492         121
3       XYZ    2008   MAE    1      1          36FA21C8      12.99       10009.33        309999.12      111.33         9.7       554             510         144

Function code:

ALTER FUNCTION [sch_Common].[ufn_SplitString]
(     
      @Input NVARCHAR(MAX),
      @Character CHAR(1)
)

RETURNS @Output TABLE (
      Item NVARCHAR(1000)
)

AS

BEGIN

      DECLARE @StartIndex INT, @EndIndex INT

      SET @StartIndex = 1
      IF SUBSTRING(@Input, LEN(@Input) - 1, LEN(@Input)) <> @Character
      BEGIN
            SET @Input = @Input + @Character
      END

      WHILE CHARINDEX(@Character, @Input) > 0
      BEGIN
            SET @EndIndex = CHARINDEX(@Character, @Input)
            INSERT INTO @Output(Item)
            SELECT SUBSTRING(@Input, @StartIndex, @EndIndex - 1)
            SET @Input = SUBSTRING(@Input, @EndIndex + 1, LEN(@Input))
      END
      RETURN
END

You will have to do a dynamic cross tab or dynamic pivot because you don't have a fixed number of columns each time this executes. — Sean Lange, Oct 18 '17 at 13:17
This question is not the same as the other one, because I am asking whether it is possible to split an existing variable into multiple variables. The other question doesn't address that. There is a similarity in that I don't ultimately know how many columns there will be. — mediaeval, Oct 18 '17 at 13:25
You stated that you are splitting these values and your code has a split function in it. Now you want to make those rows into columns right? That is the technique described in the question I marked as a duplicate. I will reopen this but not really sure what your question is. — Sean Lange, Oct 18 '17 at 13:28
In order to understand your question you need to provide some details here. Table definition including sample data. Probably need to see your split function also. And then the desired results based on your sample data. — Sean Lange, Oct 18 '17 at 13:30
Can you share your function code also? Since this question is mostly about performance posting the table ddl including indexes would help also. — Sean Lange, Oct 19 '17 at 13:04
posted function code. where do I retrieve the table ddl information...? — mediaeval, Oct 19 '17 at 14:13
I had a gut feeling your splitter would look like that. What you have there is a multi-statement table valued function. These are actually generally worse for performance than their scalar function brothers. Your has the added performance issue of looping. What you want is an inline table valued function that does your split set based. My favorite is Jeff Moden's. You can find it here. http://www.sqlservercentral.com/articles/Tally+Table/72993/ Some other excellent alternatives can be found here. https://sqlperformance.com/2012/07/t-sql-queries/split-strings — Sean Lange, Oct 19 '17 at 14:44
I would honestly just start with trashing that current splitter you have with one of the ones above. I would not be surprised if the performance improves exponentially. As for generating ddl, you can right click the table in SSMS and script -> as create.... — Sean Lange, Oct 19 '17 at 14:45
Actually, I have experienced no performance issues with the current splitter. If I exclude the ufn.item in the Select statement, and its OUTER APPLY counterpart beneath the FROM statement, the code takes 0:31 to run and returns 173,817 rows. When I include it, it takes 1 minute and 9 seconds to run and produces 660,765 rows. I'm running out of characters now to say all I want to say. But I think Yogesh's solution, below, provides exactly what I want, but I am unable to work it into my code. It eliminates the need for a splitter function competely. — mediaeval, Oct 20 '17 at 09:38
I would argue that more than doubling the execution time is a major performance problem. Honestly 31 seconds to get 173 thousand rows is pretty slow. — Sean Lange, Oct 20 '17 at 13:57
It's a valid point. I can live with it though, for what I want it to do. Can you offer any advice on Yogesh's method below...? Thanks for your help. — mediaeval, Oct 20 '17 at 14:08
Without seeing the code you are using and understanding what you are trying to accomplish there isn't much to offer. But using a better splitter will help this process and any other process where you need to parse values. — Sean Lange, Oct 20 '17 at 14:09

Yogesh Sharma · Accepted Answer · 2017-10-23T06:01:19.033

Assuming that You looking something as below (Varibale : ABD_1999_MAE_1):

Column1 Column2 Column3 Column4
ABD     1999    MAE     1

If, above is correct then You could use XML method and CROSS APPLY

SELECT DISTINCT
       A.t_proj,
       split.a.value('/X[1]', 'NVARCHAR(MAX)') Col1,
       split.a.value('/X[2]', 'NVARCHAR(MAX)') Col2,
       split.a.value('/X[3]', 'NVARCHAR(MAX)') Col3,
       split.a.value('/X[4]', 'NVARCHAR(MAX)') Col4,
       A.Scenario,
       A.ExecRun_UID,
       A.InvExpSCF,
       A.InvExpReinSCF,
       A.SolvSurvXCF,
       A.SolvDeathXCF,
       A.ReinSwap,
       A.ExpenseValXCF,
       A.AnnClaimVal,
       A.DeathClaimVal
FROM
(
    SELECT T1.t_proj,
           CAST('<X>'+REPLACE(T1.SubtotalKey, '_', '</X><X>')+'</X>' AS XML) AS String,
           T1.Scenario,
           T1.ExecRun_UID,
           T1.InvExpSCF,
           T1.InvExpReinSCF,
           T2.SolvSurvXCF,
           T2.SolvDeathXCF,
           T2.ReinSwap,
           T3.ExpenseValXCF,
           T3.AnnClaimVal,
           T3.DeathClaimVal
    FROM T1
         INNER JOIN  T2 ON T2.t_proj = T1.t_proj
                              AND T2.SubtotalKey = T1.SubtotalKey
                              AND T2.Scenario = T1.Scenario
         INNER JOIN  T3 ON T3.t_proj = T1.t_proj
                              AND T3.SubtotalKey = T3.SubtotalKey
                              AND T3.Scenario = T1.Scenario
) AS A
CROSS APPLY String.nodes('/X') split(a);

Desired Result :

t_proj  Col1   Col2   Col3   Col4   Scenario   ExecRun_UID   InvExpSCF   InvExpReinSCF   SolvSurvXCF   SolvDeathXCF   ReinSwap   ExpenseValXCF  AnnClaimVal DeathClaimVal
1       ABD    1999   MAE    1      1          36FA21C8      5334.44     37.88           543.88         12.33          1.2       100             901         678
2       EMM    E12    MAE    3      1          36FA21C8      1894.88     1298.3          2985.11        59.31          4.6       200             492         121
3       XYZ    2008   MAE    1      1          36FA21C8      12.99       10009.33        309999.12      111.33         9.7       554             510         144

I can see how this works for a test value in place of . How can I work this into my existing code? — mediaeval, Oct 18 '17 at 13:28
many thanks. This is exactly what I was looking for. One final question...why is the "split" there? If I remove it, it still works fine. Although the Microsoft page on this does the same thing. Thanks again. — mediaeval, Oct 24 '17 at 09:40

T-SQL - How can I split a variable into 4 columns for output?

1 Answers1