4

I have a SQL Server database that is storing the contents of files in a table. Specifically, there are 2 fields:

  • Contents: varbinary(max) field that always starts with '0x1F.....'

  • FileType: varchar(5) field that has the type of file, such as PDF, docx, etc.

How can I convert the contents back into a file? I am trying to use Coldfusion, if that is possible, to convert it. If not, what are the steps to convert the binary into a file?

I tried the following (assuming a docx filetype) but it didn't produce a valid word file:

<cfset DecodedValue = BinaryDecode(contents,"hex")>
<cffile action="WRITE" output="#DecodedValue#" file="C:\decodedfile.docx">
Neuron
  • 5,141
  • 5
  • 38
  • 59
user1383201
  • 75
  • 1
  • 6
  • You tried `binaryDecode()` but that function converts a string into binary data. You want the opposite. Try using `binaryEncode()` instead. – Miguel-F May 24 '18 at 12:10

2 Answers2

3

Thanks to User Ageax, the first 4 size bytes of 31,-117,8,0 show the content is stored in GZIP format instead.

I first save the content as a gzip then extract the file. My code is as follows:

<cfquery name="getfile" datasource="tempdb">
select content from table
</cfquery>

<cfset FileWrite("C:\mygzipfile.gzip", getfile.content)>

To extract gzip to a file using coldfusion, I used the solution at: http://coldfusion-tip.blogspot.com/2012/04/unzip-gz-file-in-coldfusion.html

user1383201
  • 75
  • 1
  • 6
2

tldr;

The data is already binary, so ditch the binaryX() functions and save the content directly to a file. Read the first few bytes of the binary to verify the file type. In this case, turns out the document was actually stored in GZIP format, not raw DOCX.


Don't be misled by how SSMS chooses to display it. SSMS displays binary in user friendly hex format, but it's still stored as binary. Just write the binary directly to the file, without any BinaryX functions.

<cfset FileWrite("C:\decodedfile.docx", contents)>

Also, check your DSN settings and ensure the "BLOB - Enable binary large object retrieval (BLOB)" setting is enabled, so binary values aren't truncated at 64K (default buffer size).

Update 1:

The FileWrite() code above works correctly IF the "contents" column contains the binary of a valid .docx file. Perhaps the data is being stored differently than we're thinking? Run a query to retrieve the binary of a single document and output the first four bytes. What is the result? Typically, the first four bytes of .docx files should be 80, 75, 3, 4.

<!--- print size and first 4 bytes --->
<cfoutput>
    size in bytes = #arrayLen(qYourQuery.contents)#<br>
    <cfloop from="1" to="4" index="x">
        byte #x# = #qYourQuery.contents[1][x]#<br>
    </cfloop>
</cfoutput>

Update 2:

Closest I could find to 1F 8B 08 is GZIP. Try using probeContentType() on the saved file. What does it report?

<cfscript>
    paths = createObject("java", "java.nio.file.Paths");
    files = createObject("java", "java.nio.file.Files");
    input = paths.get("c:/yourFileName.docx", []);
    writeDump(files.probeContentType(input));
</cfscript>
SOS
  • 6,430
  • 2
  • 11
  • 29
  • Thanks, yes I realized this now. I used the code above but it didn't generate a valid file. I have the BLOB/CLOB enabled in the DSN settings as well. – user1383201 May 24 '18 at 16:36
  • How are you inserting the data into the db? In CF11, it works fine with a 76kb file. Test steps 1) Read in file with `FileReadBinary()` 2) Insert with `cfqueryparam cfsqtype=cf_sql_blob` 3) CFQuery data 4) Save to file with `FileWrite(path, query.contentsColumn)`. – SOS May 24 '18 at 17:06
  • Unfortunately I don't have this information as I just inherited this database and trying to convert this content back to files. – user1383201 May 25 '18 at 00:38
  • Perhaps it's stored differently than we're thinking. 1. Open the saved file with something like notepad. What do you see? 2. Did you try BinaryEncode like MiguelF suggested? – SOS May 25 '18 at 12:14
  • Opening the file in notepad, I can see gibberish: ‹ ì»S°(K´-¸´—mÛ¶mÛ¶mÛ¶mÛk/Û¶mÛï. If I use binaryencode (either hex or base64), I see the associated represented form of the data such as: 1F8B080000000000... – user1383201 May 25 '18 at 15:29
  • "see gibberish" That sounds like binary, which is good. Can you run the snippet in my updated answer? That'll dump the first 1st four bytes of the file, which usually indicate the file type. – SOS May 25 '18 at 16:22
  • Yes, I did that, this is the results: size in bytes = 41961 byte 1 = 31 byte 2 = -117 byte 3 = 8 byte 4 = 0 – user1383201 May 25 '18 at 16:39