1

I'm curently storing a list of ids in a column as a CSV string value ('1;2;3').

I'd like to optimize with a better approach (I believe) which would use varbinary(max).

I'm looking for tsql functions
1 . That would merge side by side a set of integer rows into a varbinary(max)
2 . That would split the varbinary field into a set of int rows

Any tips appreciated, thanks

TheGameiswar
  • 27,855
  • 8
  • 56
  • 94
uzul
  • 1,096
  • 9
  • 23
  • 1
    A better approach would be to normalise your data. If you have three items of data which are the same "type" of information, then yes, they do belong in the same column, but they belong in *separate* rows. – Damien_The_Unbeliever Aug 05 '16 at 09:08
  • 2
    No. It would not be better. It would still be keeping multiple values in the same column. You better create another table to hold these values one for each row. For more information, Read [Is storing a delimited list in a database column really that bad?](http://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad), where you will see a lot of reasons why the answer to this question is **Absolutly yes!** – Zohar Peled Aug 05 '16 at 09:09
  • This is not a question about the 'best practices'. I'm storing these values as results of a precomputed costy operation. – uzul Aug 05 '16 at 09:19
  • If this is not about "best practices", how are we meant to interpret "I'd like to optimize with a better approach". And you haven't even explained what reason you have for believing that performing two additional transformations on this data will be "better" – Damien_The_Unbeliever Aug 05 '16 at 09:20
  • I meant 'a better approach of storing multiple values'. Storing multiple values as a varbinary(max) is what Microsoft is doing in their GraphView project – uzul Aug 05 '16 at 09:26

1 Answers1

0

Solution is very questionable. I'd also suggest to normalize the data.
However, if you still want to store your data as VARBINARY, here is the solution:

CREATE FUNCTION dbo.fn_String_to_Varbinary(@Input VARCHAR(MAX))
RETURNS VARBINARY(MAX) AS
BEGIN
DECLARE @VB VARBINARY(MAX);
WITH CTE as (
    SELECT CAST(CAST(LEFT(IT,CHARINDEX(';',IT)-1) AS INT) as VARBINARY(MAX)) as VB, RIGHT(IT,LEN(IT) - CHARINDEX(';',IT)) AS IT
    FROM (VALUES (@Input)) as X(IT) union all
    SELECT VB + CAST(CAST(LEFT(IT,CHARINDEX(';',IT)-1) AS INT) as VARBINARY(MAX)) as VB, RIGHT(IT,LEN(IT) - CHARINDEX(';',IT)) AS IT FROM CTE WHERE LEN(IT) > 1
)
SELECT TOP 1 @VB = VB FROM CTE
ORDER BY LEN(VB) DESC
RETURN @VB
END 
GO
DECLARE @Input VARCHAR(MAX) = '421;1;2;3;5000;576;842;375;34654322;18;67;000001;1232142334;'
DECLARE @Position INT = 9
DECLARE @VB VARBINARY(MAX)
SELECT @VB = dbo.fn_String_to_Varbinary(@Input)
SELECT @VB, CAST(SUBSTRING(@VB,4*(@Position-1)+1,4) AS INT)
GO

The function converts string into VARBINARY and then script extracts 9th number from that VARBINARY value.

Do not run this function against a data set with million records and million numbers in each line.

Slava Murygin
  • 1,951
  • 1
  • 10
  • 10
  • Took me some time to understand it :) What I'm really looking for is a way to store the ints as 4 bytes memory slots side by side, and without using strings anymore. wonder if tsql can handle that – uzul Aug 06 '16 at 03:12
  • How many values do you want to store? Theoretically, you can have up-to 30K INT columns. However, I'd better suggest to store these values in separate table. It will be much easier/faster to extract. – Slava Murygin Aug 07 '16 at 17:24