0

I have a large XML which needs to be validated against a bunch of things and then manipulated to include more information. Currently, I am doing it with XDocument and Linq. The performance (mainly latency) is not good, and I am looking to optimize this. I was thinking of deserializing the XML into a POCO first, and then using it to validate and manipulate its properties. I can then serialize it back to XML at the end.

The XML are generally a few MB in size, so memory is not a big issue. But latency has become a big concern, since I am doing multiple passes.

Before I start investing time into implementing this, I wanted to know if this approach has merit. Will it result into latency improvement?

  • Does this help? https://stackoverflow.com/questions/676274/what-is-the-best-way-to-parse-big-xml-in-c-sharp-code – Klaus Gütter Nov 22 '21 at 10:44
  • If your access can be made strictly sequential (or at least part of your operations can be compartmentalized this way), you can use [streaming](https://learn.microsoft.com/dotnet/standard/linq/perform-streaming-transform-large-xml-documents). For a custom object model, things may or may not be faster -- that would depend entirely on what kind of object model you build and what you do with it. Repeated search operations would still require some sort of indexing first, for example. Because LINQ to XML uses deferred execution, it tends to be pretty good at this sort of thing already. – Jeroen Mostert Nov 22 '21 at 10:48
  • Thanks for this. I am mostly concerned with latency. Memory is not a problem. I can load the whole XML into memory without any issue. – seaofpuppies Nov 22 '21 at 10:51
  • 1
    Changing technology may solve your performance problems, but generally, if you have performance problems then it's best to understand them before making changes that may or may not improve matters. If you're using bad algorithms in your processing, like nested-loop joins, then using a different technology to implement the bad algorithm isn't going to make a jot of difference. – Michael Kay Nov 22 '21 at 12:34
  • I agree with @MichaelKay, we need to see a [mcve] to help you. Also take a look at https://ericlippert.com/2012/12/17/performance-rant/ which recommends that, rather asking whether one piece of code would be faster than another, we should just test these sorts of things ourselves: *you can easily and accurately discover which of two programs is faster by running both yourself and measuring them with a stopwatch.* – dbc Nov 23 '21 at 18:40
  • @dbc with the caveat that you can also very easily come to spurious conclusions if you aren't very careful with your measurement methodology. – Michael Kay Nov 25 '21 at 09:39

0 Answers0