0

Basically, I want to extract the stream from the XmlReader and directly base64 decode it to a file.

The structure of the XML file can be seen here. To get the value I have to use ReadInnerXml(). Is it possible to use ReadValueChunk instead?

Here is my current code:

using (XmlReader reader = XmlReader.Create("/your/path/47311.xml"))
{
    while(reader.Read())
    {
        if (reader.IsStartElement () && reader.NodeType == XmlNodeType.Element) {
            switch (reader.Name) {
            case "ttOutputRow":
                reader.ReadToDescendant ("cKey");
                switch (reader.ReadInnerXml ()) {
                case "findMe":
                    reader.ReadToNextSibling ("cValue");
                    // here begins the interesting part
                    char[] buffer = new char[4096];
                    int charRead;
                    using (var destStream = File.OpenWrite ("/your/path/47311.jpg")) {
                        while ((charRead = reader.ReadValueChunk (buffer, 0, 4096)) != 0) {
                            byte[] decodedStream = System.Convert.FromBase64String (new string (buffer));
                            await destStream.WriteAsync(decodedStream, 0, decodedStream.Length);
                            Console.WriteLine ("in");
                        }

                    }
                    break;
                default:
                    break;
                }
                break;
            default:
                break;
            }
        }
    }
}

Currently, he doesn't read the value in.

Can't I use ReadValueChunk for this? How can I directly use the stream from the XmlReader without sacrificing too much memory?

Edit:

According to dbc I modified my code. This is what I currently use:

using (XmlReader reader = XmlReader.Create("test.xml"))
{
    while(reader.Read())
    {
        if (reader.IsStartElement () && reader.NodeType == XmlNodeType.Element) {
            switch (reader.Name) {
            case "ttOutputRow":
                reader.ReadToDescendant ("cKey");
                switch (reader.ReadInnerXml ()) {
                case "findMe":
                    reader.ReadToNextSibling ("cValue");
                    byte[] buffer = new byte[40960];
                    int readBytes = 0;
                    using (FileStream outputFile = File.OpenWrite ("test.jpg")) 
                    using (BinaryWriter bw = new BinaryWriter(outputFile))
                    {
                        while ((readBytes = reader.ReadElementContentAsBase64(buffer, 0, 40960)) > 0) {
                            bw.Write (buffer, 0, readBytes);
                            Console.WriteLine ("in");
                        }

                    }
                    break;
                default:
                    break;
                }
                break;
            default:
                break;
            }
        }
    }
}

Here you can find a test file. The real file is a little bit bigger and therefore takes much more time.

The above code doesn't work as expected. It is very slow and the extracted image is mostly black (destroyed).

Community
  • 1
  • 1
testing
  • 19,681
  • 50
  • 236
  • 417
  • I just tried your updated code on the test file you provided, and it works fine. The image is of some guy surfing. To test performance, be sure you're running outside of visual studio on a release build. Also, writes to the console can be very slow. – dbc May 08 '15 at 07:57
  • That's interesting. I'm using Xamarin Studio on Mac for my application and here the image isn't correctly decoded. The thing is why it isn't working for me? I can upload you the problematic image if you don't believe me. I'll try it without writing something to the console the next time (but currently I'm working on another project). But performance shouldn't be so bad in debug configuration. I'll keep you informed. – testing May 08 '15 at 08:04
  • 1
    Now I tried it with a console project and here it was way much faster, but the image is still corrupt. I ask a colleague and the same code is working for him (he is using Xamarin Studio on Windows). So it seems it is a bug. – testing May 08 '15 at 10:27

1 Answers1

1

In order to give a definitive answer to your question I would need to see the XML you are trying to read. However, two points:

  1. According to the documentation for Convert.FromBase64String:

    The FromBase64String method is designed to process a single string that contains all the data to be decoded. To decode base-64 character data from a stream, use the System.Security.Cryptography.FromBase64Transform class.

    Thus your problem may be with decoding the content in chunks rather than with reading it in chunks.

  2. You can use XmlReader.ReadElementContentAsBase64 or XmlReader.ReadElementContentAsBase64Async for exactly this purpose. From the docs:

    This method reads the element content, decodes it using Base64 encoding, and returns the decoded binary bytes (for example, an inline Base64-encoded GIF image) into the buffer.

    In fact, the example in the documentation demonstrates how to extract a base64-encoded image from an XML file and write it to a binary file in chunks.

dbc
  • 104,963
  • 20
  • 228
  • 340
  • Now I tried using `ReadElementContentAsBase64`. This function seems to be very slow. It doesn't matter what buffer size I choose (the bigger the faster, but still too slow). The image is not correctly decoded. I can see the beginning of the image, but the rest (the main part of the image) stays black. I've updated my question including a test file. I also tried the code 1:1 from the example, but the result stays the same. – testing May 08 '15 at 07:07