-1

I have a selection of docx files stored as blob data in hexadecimal, I need to retrieve these so I can access the text within.

So far, I have converted the hex to string format with the following:

Dim blob = BLOB DATA
Dim con As String = String.Empty
For x = 2 To st.Length - 2 Step 2
    con &= ChrW(CInt("&H" & st.Substring(x, 2)))
Next

However, if I then save the output from this as a .docx the file will not open because it is 'corrupt'. I presume that is why when I load this string into a memorystream and then try and use Novacode.DocX.Load(memoryStream) it gives me a similar corruption error.

I have tried splitting to byte array in two fashions, both give me different results.

System.Text.Encoding.Default.GetBytes(hex)

I have also tried.

Public Function HexToByteArray(hex As String) As Byte()
    Dim upperBound As Integer = hex.Length \ 2
    If hex.Length Mod 2 = 0 Then
        upperBound -= 1
    Else
        hex = "0" & hex
    End If
    Dim bytes(upperBound) As Byte
    For i As Integer = 2 To upperBound
        bytes(i) = Convert.ToByte(hex.Substring(i * 2, 2), 16)
    Next
    Return bytes
End Function

I then tried converting them both to a memory stream and using them to create a DocX object like so:

Dim doc As DocX = DocX.Load(New MemoryStream(bytes))
Jacob Mason
  • 1,355
  • 5
  • 14
  • 28
  • Please show both encode and decode methods. Any reason you are using hex and not binary stream => base64 and back again? – Sam Makin Jan 05 '16 at 14:39
  • docx is not a text format, it's a binary format. Thus, converting it to a string is just plain wrong. Your end result needs to be a byte array. I have flagged your question as a duplicate of a question which addresses exactly that. – Heinzi Jan 05 '16 at 14:39
  • Possible duplicate of [How do I convert a Hexidecimal string to a Byte Array?](http://stackoverflow.com/questions/14970436/how-do-i-convert-a-hexidecimal-string-to-a-byte-array) – Heinzi Jan 05 '16 at 14:39
  • Possible duplicate of [convert file to base64 function output](http://stackoverflow.com/questions/10739264/convert-file-to-base64-function-output) – Sam Makin Jan 05 '16 at 14:40
  • @SamMakin I don't have any control over encoding, I am just trying to decode these. – Jacob Mason Jan 05 '16 at 14:47
  • @Heinzi Thank you for linking that hex to byte array, however my hex isn't hyphen delimited. It's continuous. – Jacob Mason Jan 05 '16 at 14:48
  • @JacobMason: I see. Then don't split on `-`, [split on the length of 2 instead](http://stackoverflow.com/q/8774392/87698). – Heinzi Jan 05 '16 at 15:03

1 Answers1

0

docx is not a text format, it's a binary format. Thus, converting it to a string is just plain wrong. Your end result needs to be a byte array.

Knowing that, your problem can be split into two simpler problems:

  1. Split your hex string into strings of two characters each. See this SO question for details (or keep your existing loop, which is perfectly fine):

    How to split a string by x amount of characters

  2. Convert those "small" strings, which contain the hexadecimal representation of a byte, into bytes. See this SO question for details:

    How do I convert a Hexidecimal string to a Byte Array?

Combining those two solutions is left as an exercise to the reader. We don't want to spoil all the fun or ruin the learning experience. ;-)

Community
  • 1
  • 1
Heinzi
  • 167,459
  • 57
  • 363
  • 519
  • One last question, the 0x in front of the hex. It seems to be causing some issues, should I ignore it? – Jacob Mason Jan 05 '16 at 15:07
  • It's now converted to a byte array, however it is corrupt. I have tried file.writeallbytes, but it doesn't work. – Jacob Mason Jan 05 '16 at 15:55
  • @JacobMason: Then the encoding might not be as simple as you think it is. I would suggest a binary compare of the resulting (decoded) file with the original file to find out where the differences are. Or double-check with the person who wrote the encoding code. – Heinzi Jan 05 '16 at 16:00
  • If we could see how the file was encoded we might be able to recreate the problem..... and actually help.... It should *really* just be a case of reversing the operation. – Sam Makin Jan 05 '16 at 16:12