0

Currently we are using query aggregation for all of our SQL transaction. We are now looking to add BLOB support to the server, which would mean that we will have to convert the byte array to a string, so it can be sent with the query. We would like to have the BLOB data as an byte array on the server, but the server seems to miss some of the characters. Especially '' and '[newline]'.

This is, what I assume, because of SQL Server "Line Continuation". I have tried with different methods, and nothing seems to help. I am wondering if there is a way to tell my stored procedure to not do it or to stop this feature on the server or if there is other ways to bypass this that does not require any changes to the data.

We are using ASCII conversion on the client in c#

Encoding.ASCII.GetString(Blob, 0, Blob.Length).Replace("'", "''")

to make as few conversions as possible.

On the server we convert the string to varbinary with

CONVERT(VARBINARY(MAX), (@BLOB))

I know using

Convert.ToBase64String(BLOB)

will stop this problem. This would, however, also require us to (Convert bytes to string on client)->(Convert string to bytes in SQL)->(Convert bytes to string either on server or client)->(Convert string to byte on client) for one blob to be uploaded and download to and from the server.

We can't go away from aggregation, as it is required to keep the servers running more smoothly, and saving the string would take more space than necessary.

Any knowledge about this is welcome

Edit

Many people in the comments are confused about this question, and I can understand it. I will try to answer to the why, even thou it does not have anything with my OG question to do. There will be a lot of typos, as I am doing this in my underwears during midnight in my bed.

We have testing software solution, which are here and there. They are mostly in factories on bad pc:s with very limited internet access. Each station can have a certain amount DUTs (Device Under Test) running, which all have a test performing on them. Each test have a certain amount of steps to perform. So there can be a station with 1 dut that has 13 test steps and there can be a station with 50 duts that has 40 tet steps. These can be located in China, poland, canada etc (the companies often have most in europe and a couple onanother continent).

We do not have any server side software, so all inserts becomes an injection directly (this is bad, we know, but it is 10 years old legacy code and everything is in rambles as I am the lone fulltimer on this project, while only working for 6 months and were never able to talk with the previous lone fulltimer because of natural causes). Because of this, and low funds for the project, I will not try to do any drastic changes.

So the problem, what I have heard, was that when having 50 threads trying to insert 13 sepparate step objects did not just hog up the servers connection, but also stop the production. The solution was "Query Aggregation" (I thought this was some SQL standard bullshit as I want to remember I have read in the code comments about is but I have not found it) which isbundling everything into a string that is then processed on the server. The server will then slice it up and send the variables to all the stored procedures (I am only the messanger, not the SQL wizard). And when doing this way we need to convert the Blob to a string too, as it is boumd to a step in the test and pipeline for the process is like a labyrinth, so almost impossible to get the id back by changing the calls.

Now I want to change the blob byte array, in c#, with ascii, as MSSQL uses normal ascii conversion when I will cast it from varchar(max) to a varbinary(max). The problem is that some characters mysteriously disapear, when doing so, as I can not convert from the byte array back to a zip when it is downloaded from the server. I have discovered, by comparing the ascii characters one by one, that three characters (or six bytes) has been removed from the array of bytes that has been on the server. In ASCII these three characters are '', '' and '\r', and they all apeared at the end of tehir rows.

By comparing this find with Microsofts own documentation on Line Continuation (which can be viewed here) I thought these cases seemed to be linked, which is why I would like to disable it as I do not use this functionality.

I can also use the

Convert.ToBase64String(BLOB)

as it has never gnerated this problem once, even when sending in a 50 MB file. It can be an coincidence, but I believe that it does something that prevents this scenario from happening (I want to remember that I took it from some brainiack that used this method for the same reason or something like that). The problem is 1. the string becomes very big (I believe 2,6 times bigger than ascii) and 2. the server doesn't convert this way. So when I cast it will asume that I have an ascii string, which means that the bytes on theserver will represent an ascii string of a Base64String representing an byte array, which creates alot of conversions.

We are on the decission step of if we should just skip converting it to a byte array on the server and save it as a varchar(max) instead or if we should keep foghtingto find a solution to keep it as a byte array so the users can just download it without any conversions. And by we I mean my mentor that got one foot in this project and I.

As mentioned before, any knowledge about this is welcome

  • what is "query aggregation"? What are you actually doing with these blobs? Normally I would not expect to convert a blob to Ascii (unless I knew it was actually ascii text data), so I'm not really understanding this question (and perhaps others are also wondering, as am I). – topsail Mar 30 '23 at 19:59
  • You are using terms that are completely non-standard: *query aggregation* what is this? *We are now looking to add BLOB support to the server, which would mean that we will have to convert the byte array to a string* BLOB is just a binary byte array value, why does that mean you need to convert strings if you have a separate column for strings? *I assume, because of SQL Server "Line Continuation"* there is no such thing. *We are using ASCII conversion on the client in c#* I thought you were trying to convert the other way? *On the server we convert the string to varbinary* why not just .... – Charlieface Mar 30 '23 at 20:26
  • .... leave it as binary in the first place? *I know using Convert.ToBase64String(BLOB) will stop this problem* why would you want Base64, and why would it stop problems if there was one? It sounds like you are not fully understanding how to transfer byte arrays as `varbinary`, and possibly you have a misunderstanding on what aggregation is. – Charlieface Mar 30 '23 at 20:27
  • _which would mean that we will have to convert the byte array to a string, so it can be sent with the query_ which is exactly *not* how blob storage works. It sounds like you're trying to use SQL Injection if you're needing to convert your binary values to a string to "send with the query." You should be using parameterized queries instead, then you can send your binary values as `varbinary(max)` query parameters. – AlwaysLearning Mar 30 '23 at 21:37
  • Sql Server supports blob columns, binding blob sql parameters, and multiple statements per sql command. You shouldn't need to convert to a string unless you are using sql files as an offline data interchange format. – Jeremy Lakeman Mar 31 '23 at 00:16

1 Answers1

0

OK I'm not going to go into the whole rat's nest, but you quite clearly have other problems apart from this, and you should strongly consider sweeping away whatever garabage SQL you have already in favour of some properly written code. It's almost certainly not necessary to "aggregate" the data and then break it up again, when you can just use separate parameters and/or table value parameters.

Be that as it may, ideally you would parameterize your query properly. For example, in C# you would do as follows

command.Parameters.Add("@myBlob", SqlDbType.VarBinary, -1).Value = someByteArray;

And then the SQL can refer to that @myBLOB variable directly. This is the safest and most performant method of transferring a byte array to SQL Server, and I recommend you always use this method


If you cannot parameterize your query for whatever reason, a safe method of injection is necessary, which avoids any issues of encoding or misinterpretation. You can pass a hex string like this (C# again)

var blobString = Convert.ToHexString(someByteArray);
var query = $"
DECLARE @myBlob varbinary(max) = 0x{blobString};

INSERT ...
Charlieface
  • 52,284
  • 6
  • 19
  • 43