0

I have a large xml (size: 20 mb) file. I am calling a method in a loop for different XPaths. In a method I am using xmlDoc.selectsinglenode for each XPath, but the total process in taking 2 hours. Is there any alternate way of selectingSingleNode.

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
user1104946
  • 680
  • 1
  • 9
  • 30
  • I think you need to rethink your entire logic. 20mb isnt _that_ much. 2 hours seems ridiculous. Are you looping efficiently? Are you taking unnecessary resources into memory? – maccettura Oct 05 '18 at 15:12
  • I have around 373 xpaths to loop. Is it not alot? 20 mb file and selecting a single node for 373 times. – user1104946 Oct 05 '18 at 15:14
  • Why not take the whole document and deserialize into a class, then just access the class properties as you normally would. If you are selecting that much data just take the whole thing into memory and efficiently traverse it. You could likely get this process down to a few seconds/minutes – maccettura Oct 05 '18 at 15:15
  • the structure is not simple. It has many nested elements. The class would be very complicated. – user1104946 Oct 05 '18 at 15:16
  • The class would just act as a POCO, its ok if its complicated. Its way easier to traverse object structures than parsing a 20mb document 373 times. Like I said, this process could take as little as a minute if you optimize – maccettura Oct 05 '18 at 15:19
  • 2
    There are ways to write efficient and fast XPath statements, and there are ways to write inefficient and extremely slow XPath statements (hint: `//myElement` will be very slow on very large documents). Post some of your XPaths, so that we can help diagnose. Also, you might consider applying an XSLT to generate the desired output, which can take advantage of match expressions and keys and could out-perform simple XPath selects) – Mads Hansen Oct 05 '18 at 15:24
  • 2
    We really need to see a [mcve] to help you. 2 hours sounds extremely long for a simple xpath search; maybe you're doing something that is actually quadratic in the element depth, such as recursively searching for every node, then recursively searching that nodes descendants? – dbc Oct 05 '18 at 15:35
  • That being said you could always try steaming through the file with an `XmlReader`. See e.g. [Parsing big XML file with element names repeated at different levels](https://stackoverflow.com/a/34010683) or [Read Mulitple childs and extract data xmlReader in c#](https://stackoverflow.com/q/38425140/3744182). – dbc Oct 05 '18 at 15:37
  • With performance, the devil is always in the detail. The first bit of detail missing here is, what is the offending XPath expression? But I would also want to look at the big picture. What are you trying to achieve and is there a better algorithm? If the algorithm is right, is C#+XPath the right technology combination to be implementing it? – Michael Kay Oct 05 '18 at 15:46
  • See my answer at following posting : https://stackoverflow.com/questions/45822054/using-xmlreader-and-xpath-in-large-xml-file-c-sharp – jdweng Oct 05 '18 at 17:15

0 Answers0