0

I need to apply a procedure on every record's NVARCHAR(MAX) field in a table. The procedure will receive a large string and split it into several shorter strings (less than 100 chars). The procedure will return a result set of smaller string. These strings will be inserted into a different table (each in its own row).

How can I apply this procedure in a set-based fashion to the whole table, so that I can insert the results into another table?

I've found some similar questions on SO, however they didn't need to use the INSERT INTO construct. This means UDF and TVF functions are off the table. EDIT: functions do not support DML statements. I wanted to use INSERT INTO inside the function.

Alternatively, is there a set-based way of using a stored procedure? SELECT sproc(Text) FROM Table didn't work.

Community
  • 1
  • 1
Howie
  • 2,760
  • 6
  • 32
  • 60
  • You may be able to do a combination of top 1 split(text) with recursive CTE to produce a list that can ultimately insert into another table. Ultimately it will be a very cool query - but your head will explode before you finish admiring it. :) – Anthony Horne Apr 16 '14 at 09:06
  • While CTEs are cool, I think I'll just loop it in the end :) – Howie Apr 16 '14 at 09:09
  • 2
    Forgive me if I am stating the obvious, but don't forget that for large sets of data, a standard "while" is quicker than using a cursor. Cursors are evil! – Anthony Horne Apr 16 '14 at 09:12
  • You can pass multiple records to a procedure using [table-valued parameters](http://msdn.microsoft.com/en-us/library/bb675163%28v=vs.110%29.aspx), but I unless you can make your split procedure set based rather than procedural I suspect this will just be moving your loop to the procedure rather than actually removing it completely. – GarethD Apr 16 '14 at 09:13
  • 1
    @AnthonyHorne A while loop is not necessarily quicker than a cursor, this is a myth. I think the myth is propagated by people using the wrong type of cursor, i.e. if you only need to move forwards, and only need to read from the cursor, declare it as `FORWARD_ONLY READ_ONLY` to reduce the overhead. For further reading, [Aaron Bertrand has written a good article on this](https://sqlblog.org/2012/01/26/bad-habits-to-kick-thinking-a-while-loop-isnt-a-cursor). – GarethD Apr 16 '14 at 09:40
  • N.B I am not advocating procedural approaches, I would always look for a set based solution where possible just pointing out that using a while loop over a cursor rarely actually solves anything. – GarethD Apr 16 '14 at 09:41
  • I don't understand from your explanation why you can't convert your SP to a TVF. Then you could use `CROSS APPLY` to run the TVF for each row. – Nick.Mc Apr 16 '14 at 09:49
  • @Howie Can you show what your procedure does? It may be possible to apply the logic in a set based way? – GarethD Apr 16 '14 at 09:49
  • @GarethD Firstly, I do agree with your last statement - SET is Best and I will ALWAYS try that. On the Cursor side, cursors definitely get much slower the greater the record count, whereas general loops do not. I am reading the referred articles as we speak. Thanks. – Anthony Horne Apr 16 '14 at 09:50
  • @ElectricLlama I have not yet written the procedure, but it will return a result set of strings. What I don't know is how to insert that result set into a different table? Is that possible to do outside the TVF? – Howie Apr 16 '14 at 09:55
  • @GarethD Interesting article. I did notice that every iteration of the while test, led the colleagues to create temp tables without any form of index (#tmp vs. @tmp) instead of just looping directly, etc. In my test, the creation of colleague 1's temp table took more than half of the execution time - meaning the loop may even have been faster (or quicker). Also, Aaron's test uses very few records - which is where cursors still excel. Large Cursors crawl. You may recognise the infrequent error with cursors: "A cursor with the name '%.*' already exists.". As you say - check/compare first. – Anthony Horne Apr 16 '14 at 10:14
  • So basically your TVF will take a single string as input then return these split (in multiple records). If these different strings go into different tables you would just create an INSERT/SELECT for each table and a WHERE clause filtering in each case. I could pose a solution if you post some sample data. – Nick.Mc Apr 16 '14 at 10:58

1 Answers1

1

I am not sure of your exact logic to split the string, but if possible you can make your split function an inline TVF (Heres one I made earlier):

CREATE FUNCTION dbo.Split(@StringToSplit NVARCHAR(MAX), @Delimiter NCHAR(1))
RETURNS TABLE
AS
RETURN
(   
    SELECT  Position = Number,
            Value = SUBSTRING(@StringToSplit, Number, CHARINDEX(@Delimiter, @StringToSplit + @Delimiter, Number) - Number)
    FROM    (   SELECT  TOP (LEN(@StringToSplit) + 1) Number = ROW_NUMBER() OVER(ORDER BY a.object_id)
                FROM    sys.all_objects a
            ) n
    WHERE   SUBSTRING(@Delimiter + @StringToSplit + @Delimiter, n.Number, 1) = @Delimiter
);

Then you can simply use this in your insert statement by using cross apply with the TVF:

DECLARE @T1 TABLE (ID INT IDENTITY, TextToSplit NVARCHAR(MAX) NOT NULL);
DECLARE @T2 TABLE (T1ID INT NOT NULL, Position INT NOT NULL, SplitText NVARCHAR(MAX) NOT NULL);

INSERT @T1 (TextToSplit) 
VALUES ('This is a test'), ('This is Another Test');

INSERT @T2 (T1ID, Position, SplitText)
SELECT  t1.ID, s.Position, s.Value
FROM    @T1 t1
        CROSS APPLY dbo.Split(t1.TextToSplit, N' ') s;

SELECT  *
FROM    @T2;
GarethD
  • 68,045
  • 10
  • 83
  • 123
  • Oh, I see. I don't need nor want to have an `INSERT` statement inside the TVF! This cleared up a few things for me :) – Howie Apr 16 '14 at 11:03
  • Not only that, Functions aren't allowed to alter any data... so you can't anyway! – Nick.Mc Apr 17 '14 at 05:05