2

Updated

This problem was determined to not be an issue with XmlSerializer and deserialization, but actually an issue with reading the response stream from the remote source. Thanks to Chris Sinclair and Jesse C. Slicer for their help and direction in determining the discrepency I was seeing.


I've been profiling the time it takes XmlSerializer to deserialize a static block of XML data into my custom MyXmlObject class, and the time seems to fluctuate quite dramatically from request to request, despite the data being exactly the same.

I start timing the deserialization at the moment the XML is passed to the serializer.Deserialize() method, and stop the timing as soon as it is finished:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
//initialization stuff and writing to request stream.

using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) {
    //If I run a stopwatch up to this point, I get the round trip time,
    //so I know that the stream has already been received at this point.

    using (Stream stream = response.GetResponseStream()) //In production this stream is received from a remote server.
    {
        Stopwatch stopwatch = new Stopwatch();
        stopwatch.Start();

        MyXmlObject obj = (MyXmlObject)serializer.Deserialize(stream);

        stopwatch.Stop();
    }
}

My XmlSerializer object is defined as follows:

private static readonly XmlSerializer serializer = new XmlSerializer(typeof(MyXmlObject));

My WCF web service is a Singleton that is multi-threaded.

[ServiceBehavior(
    ConcurrencyMode = ConcurrencyMode.Multiple,
    InstanceContextMode = InstanceContextMode.Single)]

Here are the results I got for 15 concurrent requests to deserialize a static block of XML data that is only 177,792 bytes:

Static XML file of 177,792 bytes

Request   Execution (ms)
1         300
2         302
3         303
4         303
5         368 *High
6         303
7         302
8         242 *Low
9         243
10        244
11        245
12        242
13        243
14        883 *Outlier
15        260

As you can see it's somewhat consistent, but it still fluctuates by about +/-100ms

On this relatively small XML file, the fluctuation is minimal, but if I feed it a much larger file (which I will be receiving more often in my WCF web service, it fluctuates much more drastically:

Static XML file of 3,851,199 bytes

Request   Execution (ms)
1         1384
2         2402
3         1715
4         4000 *Outlier
5         1310
6         2132
7         1388
8         1654
9         1183
10        1464
11        2368
12        2752 *High
13        1094 *Low
14        1838
15        1940

So as you can see the amount of time it takes to deserialize the XML file is fluctuating much more than it did on the smaller file.

I would expect it to be relatively the same (only +/- 100ms), but instead, I'm seeing a difference of +/- 1200ms or greater.

Furthermore, 2.5s is simply more time than I wish to give up for the deserialization of the XML into C# POD objects. What's interesting is that WCF serializes the C# POD objects into JSON in relatively no time at all when I return the data to the server.

Questions

  • Why are these fluctuations happening?
  • What can I do to minimize these fluctuations, if anything?
  • Is there another type of Xml deserialization that might be faster at turning an incoming Xml of this size into an hierarchy of C# POD objects?
    • Perhaps DataContractSerializer?
    • Doesn't DataContractSerializer use XmlSerializer beneath the hood?

Please let me know if there is any information needed that I've left out that would be useful to diagnosing this problem. Thanks.

Community
  • 1
  • 1
crush
  • 16,713
  • 9
  • 59
  • 100
  • I can imagine there might be other forces at play. Being through a WCF service, especially for larger files, there might be more fluctuation simply due to HTTP/networking transfer. You also mention that these are "concurrent requests"; if you mean that they're happening simultaneously then that can also influence timing as they compete for resources or your WCF service queues requests to work on. – Chris Sinclair Aug 23 '13 at 14:32
  • I would suggest you try measuring times simply for _transferring_ the raw stream data/text _without_ deserializing and see if you see a difference. In my experience, deserializing with the XmlSerializer is very fast (on the order of fractions of a second, nowhere near 2 to 4 seconds), even for ~4mb files. – Chris Sinclair Aug 23 '13 at 14:33
  • Also, I'm assuming you're not implementing `IXmlSerializable`; I've accidentally botched that implementation once and introduced a significant bottleneck reading the data. (once fixed, the time to read was negligible) – Chris Sinclair Aug 23 '13 at 14:35
  • @ChrisSinclair By "due to HTTP/networking transfer" is that to say that in the line `(MyXmlObject)serializer.Deserialize(stream)` the entire stream has not been received from the remote source yet? – crush Aug 23 '13 at 14:37
  • @ChrisSinclair I measure the times for transferring the XML. However, notice how I start and stop the stopwatch before and after the `Deserialize` call. – crush Aug 23 '13 at 14:38
  • @ChrisSinclair `IXMLSerializable` does this need to be implemented on all the POD objects? I am not implementing that. If I implement that, then I have to define the `ReadXml` and `WriteXml` methods - if I don't implement it, then it seems to be handled automatically? Are you saying speed will increase if I implement it myself? – crush Aug 23 '13 at 14:39
  • IIRC, the act of reading the stream (say by feeding it into the `XmlSerializer.Deserialize` is what transfers it; not by simply getting a reference to the stream itself. And then, over the network, it won't necessarily read the whole stream at once; it will (I believe) transfer in chunks. As it stands, I believe the way you're measuring now is measuring _both_ transferring the draw XML _and_ deserializing it. Instead try reading the stream into a `string` variable in memory, closing the stream, _then_ deserializing that string and measure _both_ to identify your bottleneck. – Chris Sinclair Aug 23 '13 at 14:43
  • Regarding `IXmlSerializable`, I was investigating probable causes for slowdown. On the contrary, _not_ implementing it and using the default serialization behaviour is typically very fast. It can be difficult to implement the interface manually _correctly_, and thus if you had implemented it, it's probable that a logical oversight in your manual deserialization code was causing the slowdown/fluctuations. Since you are _not_ implementing it, then it shouldn't be considered (and I would _strongly_ recommend against not implementing it unless you have a _very good reason_ to do so) – Chris Sinclair Aug 23 '13 at 14:45
  • @ChrisSinclair I get the Stream from HttpWebRequest: `using (HttpWebResponse response = (HttpWebResponse)request.GetResponse()) {` then `using (Stream stream = response.GetResponseStream()) {`. If I slap a timer in between these two using statements, measuring the `request.GetResponse()` it tells me how long it takes to get the data from the remote server. – crush Aug 23 '13 at 14:45
  • I'm not too sure about that especially with an `HttpWebResponse`. I feel that the `GetResponse` simply gives you the mechanism in which to _receive_ raw data over HTTP. Otherwise if you're downloading a 4gigabyte file, you would download the entire 4gigs into memory (ruh-oh!) and not be able to read smaller chunks at a time. I could be wrong though; I haven't worked _too heavily_ with these classes. I would still suggest separating the tasks you are measuring so you can be sure you are measuring what you _think_ you are measuring (that is: stream reading vs deserialization) – Chris Sinclair Aug 23 '13 at 14:51
  • @ChrisSinclair You are suggesting that I read the entire `Stream` into a `String` before passing it to `Deserialize` to eliminate the possibility that `Stream` is reading from an open socket? – crush Aug 23 '13 at 14:55
  • (cont.) To me, the `request.GetResponse()` is about _negotiating_ and hitting the server with your _request_. The server has prepared its response, but has not yet _transferred_ that full response back. It's only provided _metadata_ about the response (like "OK! Go ahead and download this big file now." or "Nope, 404 nothing here" or "Ok, there's something here, but I need to authenticate you... oh, ok, you're authenticated. Here's the info for you to download/stream the raw data now.") – Chris Sinclair Aug 23 '13 at 14:55
  • Yes, exactly. Read the 4mb into a string. Measure that. Then turn that string into a valid reader for one of the [`XmlSerializer.Deserialize`](http://msdn.microsoft.com/en-us/library/system.xml.serialization.xmlserializer.deserialize.aspx) overloads (say `MemoryStream`, or an `XmlReader`) and measure that. – Chris Sinclair Aug 23 '13 at 14:57
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/36118/discussion-between-crush-and-chris-sinclair) – crush Aug 23 '13 at 15:02
  • @ChrisSinclair Can you post some type of answer write up so I can give you some reputation? You helped me determine that this wasn't a deserialization issue at all, and I'd like to reward you for that. Thanks. – crush Aug 23 '13 at 15:33
  • Glad to see you're getting closer to finding the culprit. – Chris Sinclair Aug 23 '13 at 15:42

2 Answers2

3

As per the comments, here's a chunk of code to measure purely the deserialization time, rather than the receiving of the response stream over the network. Note, it does rely on .NET 4's CopyTo() method on Stream to work.

var request = (HttpWebRequest)WebRequest.Create(uri);
// initialization stuff and writing to request stream.

using (var response = (HttpWebResponse)request.GetResponse())
using (var stream = response.GetResponseStream())
using (var memoryStream = new MemoryStream())
{
    // In production this stream is fully received from a remote server.
    stream.CopyTo(memoryStream);
    memoryStream.Seek(0, SeekOrigin.Begin);

    // If I run a stopwatch up to this point, I get the round trip time,
    // so I know that the stream has already been received at this point.
    var stopwatch = Stopwatch.StartNew();

    var obj = (MyXmlObject)serializer.Deserialize(memoryStream);

    stopwatch.Stop();
}
Jesse C. Slicer
  • 19,901
  • 3
  • 68
  • 87
  • Thanks. I measured it out a bit differently, but was able to determine that ChrisSinclair was correct, and that Deserialization isn't the real culprit. I'm struggling now with understanding why reading of the Stream is taking so long. For example, in SoapUI, I can get the entire raw XML back in 7700ms, dumped to the screen. `request.GetResponse()` takes almost this exact amount of time to complete. However, now it is adding another `1-2s` for retrieving the `Stream` from the remote server, after receiving the response. What I'd like to do is eliminate this `1-2s` of additional processing time. – crush Aug 23 '13 at 15:26
  • 1
    Could this question and any of the answers provide any help to you? http://stackoverflow.com/questions/901323/httpwebresponse-streamreader-very-slow or maybe this article? http://holyhoehle.wordpress.com/2010/01/12/webrequest-slow/ – Jesse C. Slicer Aug 23 '13 at 15:41
  • When I initialize my `HttpWebRequest`, I'm setting `request.Proxy = null;` as I had come across another question that mentioned something similar. I just now implemented the `BufferedStream` as is mentioned in the linked SO question, but it didn't seem to help. The reading of the stream is still taking approximately 1-2s. I will post a new question that focuses on the reading of the Stream issue, since that seems to be the true source of the problem. Thanks! – crush Aug 23 '13 at 15:53
2

Given your comments and the way you're benchmarking, I don't believe the source of the slow speed or fluctuations are due to the XML deserialization.

In my experience, using the XmlSerializer for deserialization is pretty fast, and certainly not in the realm of 2-4 seconds for a ~3.5 megabyte file. I'd bet if you were to do another test simply deserializing a static XML string without any of the networking you'd find it to be very, very fast (I'd wager less than 50ms; certainly nowhere near 2000ms).

However, with the code you posted, I believe there is some networking issues or other factors impacting the time. If you alter your benchmarking tests to separate the stream reading from the XML deserialization, you'll find that the deserialization itself likely contributes negligibly to the overall time.

I suspect that other networking issues are in play here with the WCF services. I would suggest posting a new question trimming out the XML deserialization altogether and simply measuring the speed reading the stream of raw data. Maybe a user will be able to provide you with a better idea of how to optimize/configure/improve that aspect since serialization is not the issue here.

Chris Sinclair
  • 22,858
  • 3
  • 52
  • 93
  • I wish I could extend more reputation to the both of you. I was convinced this was a problem with deserialization, and until you pointed out to me that I needed to implement more fine-grained profiling before I could be sure of that, I would have continued pursuing that line of thinking. I'm now seeing that it is taking 1-2s to read the incoming stream of data, after resolving the `HttpWebRequest`. I will post a new question. Thanks for the help! – crush Aug 23 '13 at 15:54
  • @crush: Actually, that kind of makes sense (at least in my mind). If there are networking negotiation issues (like proxy detection) that Jesse pointed out, contributing to the ~7.5 second initial request+`GetResponse()`, 1-2 seconds to _download/transfer_ ~3.5 megabytes over HTTP sounds quite plausible. – Chris Sinclair Aug 23 '13 at 16:01
  • What I'm trying to determine is if `request.GetResponse()` makes one trip to the remote server, and then if `response.GetResponseStream()` makes a second trip to the remote server. I've always been under the assumption that the response payload was embedded in the HTTP response. I'm hunting for that answer now. – crush Aug 23 '13 at 16:06
  • @crush Have you used Fiddler before? If not, I highly recommend it: http://fiddler2.com/install . You can see all the connections and traffic and timings. If I had to guess [and yes, I am just guessing :)], `GetResponse()` may be doing a HEAD request to get the size of the response, then `GetResponseStream()` does a GET (or POST, whatever you specified) to obtain the rest of payload based on information in the header. – Jesse C. Slicer Aug 23 '13 at 16:19
  • @JesseC.Slicer Wireshark and/or Chrome Developer Tools have always been my choice, but the WCF Service is running on a remote (inside my network) server. I could install wireshark on that server, but would rather not. Assuming you're correct about the HEAD+POST, I wonder if there is a means to combine these into a single trip. – crush Aug 23 '13 at 16:24
  • @crush you should be able to run those tools on your development workstation and still see what's going on under the hood, no? – Jesse C. Slicer Aug 23 '13 at 16:28
  • on my development workstation, I host the web client (HTML5/Javascript) that makes a request to the WCF service that is on my network. The WCF service then makes a HttpWebRequest POST out to a remote server outside of our network. So, I would have to capture traffic either on the server hosting the WCF service, or on the default gateway, in order to see the HTTP request/responses that are going outside of our network, which are the ones I'd want to capture. Right? – crush Aug 23 '13 at 16:30