2

I need to convert a binary file (a zip file) into hexadecimal representation, to then send it to sql-server as a varbinary(max) function parameter.

A full example (using a very small file!) is:

1) my file contains the following bits 0000111100001111

2) I need a procedure to QUICKLY convert it to 0F0F

3) I will call a sql server function passing 0x0F0F as parameter

The problem is that I have large files (up to 100MB, even if average file size is 100KB files are possible), so I need the fastest way to do this.

Otherwise stated: I need to create the string

'0x'+BinaryDataInHexadecimalRepresentation

in the most efficient way. (Note: may be there is a way to immediately open a file and obtain an hexadecimal string, so in this case all I need is to use "this way", if it is there).

Related question: passing hexadecimal data to sql server

UPDATE: after reading the comments I think it is needed to add more information here. The reason why I try to use a T-SQL text command to send the binary data to the stored procedure is that in this way I remove some overhead to the server: the stored prcoedure recieves the binary data and writes it to a file (this is my final goal). If I use a DAC component I will be able to easily send the biray data to the server, but in that case I need to use a temp table to store the data, and then sending this data to the storedprocedure that writes the file.

So the idea is:

1) using T-SQL "long" command: more overhead on client because I need to read the file and convert it to hexadecimal to preparing the long command; less server overhead since sql server just recieves the binary data and processes it in the stored function

2) using DAC: I need to pass through a temp table in sql server, therefore having more overhead on the server

Since I am using the server as web document server (it's a trick), I want to try to reduce the overhead on the server. Anyway may be I am wrong and (2) anyway is a better technique than (1)

Community
  • 1
  • 1
UnDiUdin
  • 14,924
  • 39
  • 151
  • 249
  • I'm not sure I understand why you need to *convert* the binary file. A zip file doesn't contain only 0 and 1's. – Lieven Keersmaekers May 12 '10 at 08:18
  • What I need to do is passing the binary content of a file to SQL-SERVER as varbinary(max) datatype. So I need to convert the 0 and 1 that make up a file (to my understanding a file IS a made only of 1 and 0) into hexadecimal, more precisely a string, because I will need to run a query like "EXEC dbo.MyStoredProcedure 0x0F0F" (where 0x0F0F is the varbinary(max) parameter). – UnDiUdin May 12 '10 at 08:29
  • 1
    Why not just use parameters and transfer your binary file not as a text in a TSQL command, but as a binary value ? – da-soft May 12 '10 at 08:42
  • I'm still missing something or you are mixing up some things. A bit is made up of 0 or 1. A byte is made up of 8 bits representing one ASCII character. A zip file is made up of ASCII characters. I still don't see why you need to convert the zip file. – Lieven Keersmaekers May 12 '10 at 09:00
  • Quoting Dmitry: Why not just use parameters and transfer your binary file not as a text in a TSQL command, but as a binary value ? You mean "why i don't use a DAC front end"? The reason is that I need to pass the binary data to a CLR stored procedure, while connecting to the server with external ip. I am not aware of other ways to do this. A solution is using a DAC component to sent the binary to a temp table with a varbinary(max) field, but my goal is to send the data directly to the stored procedure. That stored procedure will then write to file. – UnDiUdin May 12 '10 at 09:11
  • For Lieven: about the bit/byte/ASCII. I need to send data to the DB without a DAC frontend, so the only way I know is to use 0x... in a text T-SQL command. So I need to convert the content of my file into hexadecimal (8bits at a time, yes). But maybe I am missing some important information that would make my lide easier. Anyway waht I am trying to do is to use sql server as a kind of "webservice" for serving documents. I try to do this not because it is a best practice, but because it is acceptable in my application and it is a "trick" that I can implement without heavy architectural changes. – UnDiUdin May 12 '10 at 09:14
  • DAC or not to DAC. If what I wrote in the above 2 comments makes sense the good thing about not using DAC and using a long T-SQL text command is that I don't need the extra processing of writing to a temp table, and then send that temptable to the stored procedure. This is an extra overhead on the server. But may be it could be the best choice? – UnDiUdin May 12 '10 at 09:15
  • 1
    Simply read your binary file one byte at a time and use IntToHex to turn it into text (hex representation of your binary file). But I'd second everyone else's comment on using anything other then TEXT for transmitting the binary data, as the HEX representation will use exactly twice as many bytes in transit. – Cosmin Prund May 12 '10 at 09:46
  • Not sure, but you can also try to use MSSQL 2008 FILESTREAM feature http://msdn.microsoft.com/en-us/library/bb933993.aspx. That is not what you are exactly looking for, but as an general idea ... – da-soft May 12 '10 at 10:13
  • Cosmin: yes... This is a problem, with HEX I'd have twice the data passing over the network, making i a crazy decision for any file greater than few kB. I must go for the temp table idea in the database. I don't see anyother solution or is it there a way to pass binary data (the content of a zip file) from the client to a stored procedure (as a parameter) without using a temp table, or by the way is it good practice to use temp tables in this way? – UnDiUdin May 12 '10 at 10:13
  • Dmitry: I need to support 2005 so FILESTREAM is not an option for me. Moreover I have a huge legacy app to maintain, so I am looking for a way to do things by minimally changing the architecture. – UnDiUdin May 12 '10 at 10:13
  • FTP: too complex for deployyment. I have 300+ installations. I would need to setup 300 FTP servers. This is why I discarded it. – UnDiUdin May 12 '10 at 10:19
  • If nothing else, then read file by blocks and use BinToHex to tansfer block-by-block into text. – da-soft May 12 '10 at 10:19
  • Ok. I will make a prototype using both DAC+TempTable and BinToHex, run some tests using different file sizes, different connection speeds and see what is most suitable. – UnDiUdin May 12 '10 at 10:27

1 Answers1

1

Well here's a option that would do the conversion as fast as I can think of.

Features of the code:

  • Only one allocation for the string (so no realloc and no move or copy)
  • Fast read from the file.

Since we know that one byte translates to exactly two hexadecimal chars, we know our result string needs to be exactly twice the size of the file. We allocate an string of the required size and then we read from the file in large-enough blocks so the OS can optimize it for us (reading byte-by-byte is evil). We use an actual string but we write into the string using an pointer:

function TEditorDeschidereDeCredite.FileToHex(FileName: string): AnsiString;
var FS:TFileStream;
    PS:PAnsiChar;
    Block:array[0..1023] of byte; // 1Kb
    len,i,pos:Integer;
    B:Byte;
const Nibbs: array[0..15] of AnsiChar = ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F');
begin
  FS := TFileStream.Create(FileName, fmOpenRead);
  try
    Result := '';
    SetLength(Result, FS.Size * 2);
    PS := PAnsiChar(Result);
    pos := 0; // position into the result string
    len := FS.Read(Block, SizeOf(Block));
    while len <> 0 do
    begin
      for i:=0 to len-1 do
      begin
        B := Block[i];
        PS[pos] := Nibbs[B div $F];
        Inc(pos);
        PS[pos] := Nibbs[B mod $F];
        Inc(pos);
      end;
      len := FS.Read(Block, SizeOf(Block));
    end;
  finally FS.Free;
  end;
end;

P.S: I'm using AnsiString, and PAnsiChar so the code works also works with Unicode Delphi. If you happen to be on Delphi 2010 find a way to use this in it's current form (AnsiString) so you can skip the conversions.

Cosmin Prund
  • 25,498
  • 2
  • 60
  • 104
  • Thanks for this that does the conversion work as I desired, I modified it for D2009 as you suggested. Anyway this still doesn't make me certain that this is the most efficient way for doing FileToHex in delphi, but it could be. – UnDiUdin May 12 '10 at 12:52
  • This is going to take two comments, one comment is too short. Sorry. I wasn't suggesting any changes, I said I used AnsiString and not String so it also works effectively on Delphi 2009 and Delphi 2010. On those platforms the default String type is actually UTF16 encoded Unicode, so it uses 2 bytes per char! Ansi string is the ancient 1 byte char, so there's no overhead. In Unicode Delphi's notation, one byte converts to two chars when translated to hex, and that ocupies 4 bytes! So your 100Mb file takes up 400Mb in RAM. – Cosmin Prund May 12 '10 at 13:30
  • ... what I was suggesting was to find a way to keep it as AnsiString ALL THE WAY to the stored procedure in the DB! If at any point you do a conversion to UTF16 then you pay the penalty for the 2 bytes chars. This is why using text to represent non-text data is such a problem: Just imagine your 100Mb file converted to a 400Mb UTF16 string, then converted to UTF8 (200Mb) on the way to the SQL server and converted back to UTF16 (400Mb) if the stored procedure wants unicode text. I don't actually know how text is passed to SQL on the wire, this just an scare-scenario. – Cosmin Prund May 12 '10 at 13:38
  • ok thanks, I blindly replaced ansistring with string, but in fact it was better to leave as it is (as you commented). I got this now. Anyway I am in the domain of tricks, it is not so bad for me because tipically people read the files doesn't write them, anyway I need to test both this approach than the temp table. Temp table has the advantage of moving only 100MB on the network, not 200 or 400. At least, in principle. I don't really know what happens "underground". I will also need to ask for a clr stored procedure that writes to a file. – UnDiUdin May 12 '10 at 14:00
  • If you already have the code that works with text give it try... If you're happy with the performance then noone cares how you made it work. But if you don't have the code I'd say try anything and everything before sending 200mb chunks of text to the server! To be honest even 1 mb seems like a lot of text. – Cosmin Prund May 12 '10 at 19:29