0

I want a split function in SQL server. I came across this thread: Cannot find either column "dbo" or the user-defined function or aggregate "dbo.Splitfn", or the name is ambiguous

and I feel it is doing too many calculations using index etc. I wrote this function:

ALTER FUNCTION [dbo].[Split]
(
    @Data   varchar(8000),
    @Delimter   char(1) = ','
)
RETURNS @RetVal TABLE 
(
    Data    varchar(8000)
)
AS
Begin
    Set @Data = RTrim(Ltrim(IsNull(@Data,'')))
    Set @Delimter = IsNull(@Delimter,',')

    If Substring(@Data,Len(@Data),1) <> @Delimter
    Begin
        Set @Data = @Data + @Delimter
    End

    Declare @Len int = Len(@Data)
    Declare @index int = 1
    Declare @Char char(1) = ''
    Declare @part varchar(8000) = ''

    While @index <= @Len
    Begin

        Set @Char = Substring(@Data,@index,1)       
        If @Char = @Delimter And @part <> ''
        Begin
            Insert into @RetVal Values (@part)      
            Set @part = ''
        End
        Else
        Begin
            Set @part = @part + @Char
        End

        Set @index = @index + 1
    End

    RETURN;
End

Can anybody comment which one is efficient? I will be using this function too much for splitting the data for one of my scrapping application and I want this to be efficient. Also please mention how did you measure it's efficiency.

Community
  • 1
  • 1
Jack
  • 7,433
  • 22
  • 63
  • 107
  • 3
    Have you tried splitting a bunch of sample data - say 10,000 randomly generated rows - and comparing how long it takes between both methods? That's one of the best ways to determine actual efficiency. – mellamokb Aug 23 '12 at 13:01
  • @mellamob: That's great idea. If you read last line of my question that's exactly what I want to know. How do I compare time? Any sample would be useful. – Jack Aug 23 '12 at 13:02
  • 2
    Store the value of `SYSDATETIME()` before running, then compare it to the value of `SYSDATETIME()` after running. – Adam Robinson Aug 23 '12 at 13:03
  • And @mellamokb, that should probably be an answer. – Adam Robinson Aug 23 '12 at 13:03
  • 2
    Any discussion of this topic should point to these articles https://sqlblog.org/2009/08/06/more-on-splitting-lists-custom-delimiters-preventing-duplicates-and-maintaining-order and https://sqlblog.org/2010/07/07/splitting-a-list-of-integers-another-roundup. – Gordon Linoff Aug 23 '12 at 13:07
  • 2
    @GordonLinoff - And these new articles by Aaron again: http://www.sqlperformance.com/2012/07/t-sql-queries/split-strings and http://www.sqlperformance.com/2012/08/t-sql-queries/splitting-strings-follow-up – Lamak Aug 23 '12 at 13:47
  • You're missing the last 'word' of the original string with the above code. Try executing `SELECT * FROM [dbo].[Split]('Lorem ipsum dolor sit amet', ' ')` you will get 4 rows while it should return 5. – Luis Quijada Aug 23 '12 at 14:49
  • And one more on TVPs: http://www.sqlperformance.com/2012/08/t-sql-queries/splitting-strings-now-with-less-t-sql – Aaron Bertrand Aug 23 '12 at 15:07

4 Answers4

2

For some discussions on different string splitting methods and their efficiency, I tend to try to get people to stop trying to do this in T-SQL. You can spend hours fighting with inefficient functions to try and squeeze a few extra microseconds out of them, but it's an exercise in futility. T-SQL is inherently slow at this task and it's much better to go outside of T-SQL - either by using CLR (2005) or Table-Valued Parameters (TVPs) (2008+). I recently published a three-part series on this that is likely worth a read, and I suspect you'll come to the same conclusions I did (CLR is good, TVPs are better, and all T-SQL methods just look silly in comparison):

http://www.sqlperformance.com/2012/07/t-sql-queries/split-strings

http://www.sqlperformance.com/2012/08/t-sql-queries/splitting-strings-follow-up

http://www.sqlperformance.com/2012/08/t-sql-queries/splitting-strings-now-with-less-t-sql

Also please mention how did you measure it's efficiency.

Well, you can do what I did in those articles, select SYSDATETIME() before and after you run each test, and then calculate the difference. You can also log to a table before and after each test, or use Profiler to capture , or surround your test with:

SET STATISTICS TIME ON;

PRINT 'Test 1';

-- do test 1 here

PRINT 'Test 2';

-- do test 2 here

SET STATISTICS TIME OFF;

You'll get output in the messages pane like:

Test 1

SQL Server execution times:
  CPU time: 247 ms, elapsed time: 345 ms

Test 2

SQL Server execution times:
  CPU time: 332 ms, elapsed time: 421 ms

Finally, you can use our free tool, SQL Sentry Plan Explorer. (Disclaimer: I work for SQL Sentry.)

You can feed any query into Plan Explorer, generate an actual plan, and in addition to a graphical plan that is much more readable than the showplan put out my Management Studio, you also get runtime metrics such as duration, CPU and reads. So you can run two queries and compare them side by side without doing any of the above:

enter image description here

Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
2

Another different approach:

CREATE FUNCTION [dbo].[fGetTableFromList]
(
    @list VARCHAR(max), @delimiter VARCHAR(10)
)
RETURNS @table TABLE
(value VARCHAR(8000)) AS
BEGIN

DECLARE @list1 VARCHAR(8000), @pos INT, @rList VARCHAR(MAX);

SET @list = LTRIM(RTRIM(@list)) + @delimiter
SET @pos = CHARINDEX(@delimiter, @list, 1)

WHILE @pos > 0
    BEGIN
        SET @list1 = LTRIM(RTRIM(LEFT(@list, @pos - 1)))

        IF @list1 <> ''
            INSERT INTO @table(value) VALUES (@list1)

        SET @list = SUBSTRING(@list, @pos+1, LEN(@list))
        SET @pos = CHARINDEX(@delimiter, @list, 1)
    END
RETURN 
END

In CPU time, there is not too much difference between dbo.SplitString, dbo.Split and mine dbo.fGetTableFromList. I know this by executing this:

SET STATISTICS TIME ON;

SELECT * FROM [dbo].[Split]('Lorem ipsum dolor sit amet,...', ' ');
SELECT * FROM [dbo].[SplitString]('Lorem ipsum dolor sit amet,...', ' ');
SELECT * FROM [dbo].[fGetTableFromList]('Lorem ipsum dolor sit amet,...', ' ');

SET STATISTICS TIME OFF;

Of course, as much as more time execution records can be obtained, testing with different inputs, a more accurate idea of which function performs better you'll get.

Also you have to pay attention to the execution plan. Remove the SET STATISTICS sentences and execute the three queries above and tell SMSS to show to you the execution plan.

Just by taking a look to the summary presented in the "Execution plan" tab, and without examining the details of it, you can see that 1st one, Split, is costing 13% of the efforts foreseen, the second one, SplitString a 60%, and the 3rd one, fGetTableFromList, 13% again (the rest of the work is spent by the SELECTs).

This is the "dummy" way, not for DBAs. If you need an accurate or precise benchmark you should try writing some stress tests and extract concise results (as in the link provided by @AaronBertrand).

Luis Quijada
  • 2,345
  • 1
  • 26
  • 31
0

Try this:

CREATE function dbo.SplitString(@inputStr varchar(1000),@del varchar(5))
RETURNS @table TABLE(col varchar(100))
As
BEGIN

DECLARE @t table(col1 varchar(100))
INSERT INTO @t
select @inputStr

if CHARINDEX(@del,@inputStr,1) > 0
BEGIN
;WITH CTE as(select ROW_NUMBER() over (order by (select 0)) as id,* from @t)
,CTE1 as (
select id,ltrim(rtrim(LEFT(col1,CHARINDEX(@del,col1,1)-1))) as col,RIGHT(col1,LEN(col1)-CHARINDEX(@del,col1,1)) as rem from CTE
union all
select c.id,ltrim(rtrim(LEFT(rem,CHARINDEX(@del,rem,1)-1))) as col,RIGHT(rem,LEN(rem)-CHARINDEX(@del,rem,1))
from CTE1 c
where CHARINDEX(@del,rem,1)>0
)

INSERT INTO @table 
select col from CTE1
union all
select rem from CTE1 where CHARINDEX(@del,rem,1)=0
END
ELSE
BEGIN
INSERT INTO @table 
select col1 from @t
END


RETURN

END
AnandPhadke
  • 13,160
  • 5
  • 26
  • 33
0

@AnandPhadke,

I don't understand what you are doing with CTE. This works perfectly fine:

Create function dbo.SplitString(@inputStr varchar(1000),@del varchar(5))
RETURNS @table TABLE(col varchar(100))
As
BEGIN

DECLARE @t table(col1 varchar(100))
INSERT INTO @t
select @inputStr

if CHARINDEX(@del,@inputStr,1) > 0
BEGIN
    ;WITH CTE1 as (
    select ltrim(rtrim(LEFT(col1,CHARINDEX(@del,col1,1)-1))) as col,RIGHT(col1,LEN(col1)-CHARINDEX(@del,col1,1)) as rem from @t
    union all
    select ltrim(rtrim(LEFT(rem,CHARINDEX(@del,rem,1)-1))) as col,RIGHT(rem,LEN(rem)-CHARINDEX(@del,rem,1))
    from CTE1 c
    where CHARINDEX(@del,rem,1)>0
    )

        INSERT INTO @table 
        select col from CTE1
        union all
        select rem from CTE1 where CHARINDEX(@del,rem,1)=0
    END
ELSE
BEGIN
    INSERT INTO @table 
    select col1 from @t
END

RETURN

END
Jack
  • 7,433
  • 22
  • 63
  • 107
  • ohh thats right,I initially was using that id col from CTE and later changed the logic and forgot to remoce that CTE.Its really good catch.So this will take less time now? – AnandPhadke Aug 23 '12 at 14:27
  • @AnandPhadke: Not really. This is still slow probably due to recursion and UNION ALL. – Jack Aug 24 '12 at 02:46