T-SQL Poor performance when concatenating a large string

Question

My C# application calls a stored procedure that includes the following UPDATE statement to concatenate a VARCHAR(max) column with the string @input that is passed in as a parameter:

UPDATE [TblTest] 
SET [txtData] = [txtData] + @input
WHERE [TblTestID] = @id

@input will typically have a length of a million characters.

The stored procedure is called multiple times. As the length of the string stored in txtData column increases, the performance of the UPDATE statement degrades considerably. E.g. it takes over 30 seconds to execute when txtData has over 300 million characters, with just a single record in the table.

Are there any options available to optimise the SQL to improve performance? I'm using SQL Server 2008 R2.

Thanks for your help.

Well it has to write 300 million characters to disk. Why do you need to store that much data in a single tuple? — Sean Lange, Feb 06 '17 at 20:34
You might consider storing the values in the file system and just keeping a filename for the reference. That is a lot of data. — Gordon Linoff, Feb 06 '17 at 20:35
is `TblTestID` the primary key of `TblTest`?, you are concatenating a big chunk of data though — Lamak, Feb 06 '17 at 20:36
`SET [txtData] = [txtData] + @input` might transfer more than 300MB for a single record (read data, concatenate, write back). If I not mistaken, SQL is not really meant for fast string operations. What are actually storing there? Can't you break the information? — Alexei - check Codidact, Feb 06 '17 at 20:38
That is really kind of an absurd of amount of information. The entire king james bible has about 3.5 million letters. You are storing nearly 100 copies of the entire bible as a single string. Of course it is going to be slow. — Sean Lange, Feb 06 '17 at 20:39
@SeanLange Agree. `TblTest` caches the results of a request to a (poorly designed) legacy web service. Subsequent requests read from this table instead of the source data. Previously the results were added to this table as single insert (which was much quicker), however in order to reduce the memory footprint in the C# app, the results are now processed in chunks. I may consider storing the chunks in separate rows to avoid concatenating, or perhaps store in the file system as suggested above. The answers given below sound really promising and may offer me a quicker fix (using `WRITE` syntax). — aw1975, Feb 06 '17 at 21:49

score 3 · Answer 1 · answered Feb 06 '17 at 20:44

You can try to use UPDATE with .WRITE (expression,@Offset,@Length) (docs).

According to this source, this is a fast way to update large columns:

A traditional UPDATE statement is logged Using the .WRITE clause is minimally logged (faster)

If @Offset is NULL, the update operation appends expression at the end of the column.

In your case, try this:

UPDATE [TblTest] 
SET column.WRITE(@input, NULL, 1)  -- length is ignored
WHERE [TblTestID] = @id

score 3 · Accepted Answer · answered Feb 06 '17 at 20:44

3

You are having a performance problem so start investigating it as such. Read and follow How to analyse SQL Server performance.

Your problem could be from a missing index on [TblTestId], it could be caused by log or database growth % factor blockage, it could be your client side code slowing down as memory grows and passing in parameters slower and slower, it could be from not using the WRITE syntax for updating BLOBs (UPDATE ... SET txtData.WRITE ...) and so on and so forth. Only by measuring will you be able to find the cause.

answered Feb 06 '17 at 20:44

Remus Rusanu

288,378
40
442
569

Also this old answer can contain some valuable information for your case: http://stackoverflow.com/a/2151491/105929 – Remus Rusanu Feb 06 '17 at 20:49
Both yours and Alexei's answers helped me resolved this issue, by using the `WRITE` syntax. Thanks for your help. – aw1975 Feb 07 '17 at 23:34

T-SQL Poor performance when concatenating a large string

2 Answers2