3

so, I am editing a word document, using OpenXML. And for some reasons, I convert it all into a string:

//conversion du byte en memorystream
using (var file = new MemoryStream(text))
using (var reader = new StreamReader(file))
{
    WordprocessingDocument wordDoc = WordprocessingDocument.Open(file, true);
    using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
    {
        docText = sr.ReadToEnd();
    }
}

And then, I convert it as a byte.

But, a simple convert will not work:

byte[] back2Byte = System.Text.Encoding.ASCII.GetBytes(docText );

Because the string is a open xml string.

Tried this, but always got a corrupted file when I tried to open it with Word:

var repo = new System.IO.MemoryStream(System.Text.Encoding.UTF8.GetBytes(docText));

byte[] buffer = new byte[16 * 1024];
MemoryStream ms = new MemoryStream();

int read;
while ((read = repo.Read(buffer, 0, buffer.Length)) > 0)
{
    ms.Write(buffer, 0, read);
}

byte[] back2Byte = ms.ToArray();

So, this doesn't work either:

byte[] back2Byte = new byte[docText.Length * sizeof(char)];
System.Buffer.BlockCopy(docText.ToCharArray(), 0, back2Byte, 0, back2Byte.Length);

edit : After some checkings, it seems it is write as a openxml document into the database, and so, word cannot read it. There is no error when i open it with notepad

How can I correct this?

So, the real issue is, how can I convert a OpenXML string to a byte that can be open in word?

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
provençal le breton
  • 1,428
  • 4
  • 26
  • 43
  • Will a byte array serve your purposes? – Pseudonym May 05 '14 at 14:08
  • Yes, because I stored it into a DB as blob, so a byte array in c#. – provençal le breton May 05 '14 at 14:09
  • I suspect that you're not encoding the data properly in reading the stream into `docText`. What does that string look like? Strings can't store arbitrary data unless you use an encoding designed for that, like base64. See http://haacked.com/archive/2012/01/30/hazards-of-converting-binary-data-to-a-string.aspx/ – Tim S. May 05 '14 at 14:12
  • @TimS. The string is an openXml format, and it seems it is the issue, because the byte is written with xml format, so word cannot open it. So, the real issue is, how can I covnert a openxml string to a byte that can be open in word? – provençal le breton May 05 '14 at 14:38
  • This is all kinds of wrong. You cannot encode a Unicode string as an ASCII sting. It's impossible. There is no conversion that would allow that. And it's foolish to try and change your data so it fits your storage system. Change your storage system so it fits your data. You also need to move away from the idea of bytes - you are dealing with *characters* here. The database will support the notion of characters, just use the right data type. – Tomalak May 05 '14 at 14:57
  • @Tomalak I convert as byte because I use a BLOB field in database. What should I use as type field instead? In my program, I use a file, stored as a BLOB , and get it as a byte[], to convert it into a memorystream, then in the end openxml string(with the first part of the codein the question). I can do it this way, so is this really impossible to reverse the process? – provençal le breton May 05 '14 at 15:15
  • It depends on your database system. CLOB would be right for Oracle, `NVARCHAR(MAX)` or `NTEXT` for SQL Server, other DBMS may call it differently. BLOB is for storing real binary data (images, for example), there are alternatives for arbitrary amounts of character-based data in every database system. – Tomalak May 05 '14 at 15:20
  • @Tomalak Thank you, will I be able to open the file with word, the same way I was opening blob(or as easy as blob), using an aspx page? I need to open the document in my database from Silverlight(without active x). – provençal le breton May 05 '14 at 17:31
  • I don't understand that question. You have a string. When you store that string cleanly in a database, you will be able to retrieve and open it in any way you like. – Tomalak May 05 '14 at 17:36

1 Answers1

1

You cannot do this sort of thing. You are getting the bytes for only one part of an OpenXML document. By definition, all Microsoft Office documents are multi-part OpenXML documents. You could theoretically capture the bytes for all the parts using a technique like you're currently using, but you would also have to capture all the part/relationship information necessary to reconstruct the multi-part document. You'd be better off just reading all the bytes of the file and storing them as-is:

// to read the file as bytes
var fileName = @"C:\path\to\the\file.xlsx";
var fileBytes = File.ReadAllBytes(fileName);

// to recreate the file from the bytes
File.WriteAllBytes(fileName, fileBytes)

If you need a string form of those bytes, try this:

// to convert bytes to a (non-readable) text form
var fileContent = Convert.ToBase64String(fileBytes);

// to convert base-64 back to bytes
var fileBytes = Convert.FromBase64String(fileContent);

Either way, there is absolutely no need to use the OpenXML SDK for your use case.

Michael Gunter
  • 12,528
  • 1
  • 24
  • 58
  • But I cannot convert to a non readable text, because I must replace some text in it. In fact, actually, I have a document, store as a blob in my db, I get it as byte[] then, conevrt it to a readable string to make changes in the text, and then, reconvert it to bytes, and restore in database. But according to Tomalak, it is not a good way to do it(blob and bytes...). i just heard of openxml, and though I could do what I explained more easyly. – provençal le breton May 06 '14 at 07:12
  • You want to replace some text within the body of the document? OK, you can do that, but you need to use the OpenXML SDK to extract the string and write it back to a `WordProcessingDocument` when you are done. You cannot store only a single part of an OpenXML document and expect it to work. – Michael Gunter May 06 '14 at 13:25
  • And when it is rewrite into the WordProcessinDocument, I must restore it into my database. Can i do this? – provençal le breton May 06 '14 at 13:49
  • When you programmatically modify an OpenXML document, the changes are automatically saved into the underlying file/stream (unless AutoSave is false) when the document is disposed. Add a `using` clause around the `wordDoc` variable. After that `using` block, the `MemoryStream` referenced by your `file` variable has the modified content. Just seek to the beginning of that stream and read out the bytes. – Michael Gunter May 06 '14 at 14:12
  • More information: http://msdn.microsoft.com/en-us/library/office/ee945362%28v=office.11%29.aspx – Michael Gunter May 06 '14 at 14:12
  • Thank for your help. I will check this and see how it works more deeply. – provençal le breton May 06 '14 at 14:32