2

I have an XML (an SVG actually) file, 4000 lines, around 700 individual nodes, with one significant attribute each.

I want to preload the data, and populate actual C# runtime model graph with the data (some string splitting, and property setting).

I did it with XmlDocument, the process took 12 sec (in Unity Editor Play Mode).

I began to implement using XmlReader, only to iterate through the file took 6 sec (without any processing, only Read, MoveToNextAttribute calls).

Is there any way to read the file err... ...way faster?

6-8 sec launch time is something I cannot accept. The whole process should take around... ...half a second at most.

user3071284
  • 6,955
  • 6
  • 43
  • 57
Geri Borbás
  • 15,810
  • 18
  • 109
  • 172
  • `XmlReader` should be extremely fast. Can you post the code taking 6 sec? Also 4000 lines is quite small. – Jesse Good Jul 04 '15 at 22:58
  • Yap, 4000 lines should not be problem. Seems using DOCTYPE causes the problem. I'm parsing an SVG to obtain tons of polygon data actually. – Geri Borbás Jul 04 '15 at 22:59
  • I had also the same problem. The first improvement was the same you noticed, i.e. use XmlReader instead of XmlDocument. The second was to use multithreading so as to process the first level nodes in different threads (as many threads as cores on the computer).. – Graffito Jul 04 '15 at 22:59
  • what is the size in kilobytes of the XML file ? – Graffito Jul 04 '15 at 23:00
  • Seems it was the `DOCTYPE` definition, I put it as an answer below. – Geri Borbás Jul 04 '15 at 23:13

2 Answers2

2

I simply removed DOCTYPE definition from the file itself.

It boosts 8 times performance even using original XmlDocument implementation.

<!-- <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> -->
Geri Borbás
  • 15,810
  • 18
  • 109
  • 172
  • Maybe I can even go with `XmlDocument` implementation as well? Gonna take a look into. – Geri Borbás Jul 04 '15 at 23:07
  • I see. That references an external DTD. Most likely that is being downloaded which takes time. There is info [here](http://stackoverflow.com/questions/215854/prevent-dtd-download-when-parsing-xml). – Jesse Good Jul 04 '15 at 23:10
  • I checked some Xml Files loaded by my applications. They have no DOCTYPE and are loaded in a few seconds for a size of 20 Mb. – Graffito Jul 04 '15 at 23:12
  • Now it takes `1.06` sec instead of `7.8` all using `XmlDocument`. The document is `334 Kb`. I'm thinking of lazy-parse the leaf nodes, only when they really need. – Geri Borbás Jul 04 '15 at 23:14
  • Not parsing the actual polygons upfront spares another `0.6` sec. It is actually – Geri Borbás Jul 04 '15 at 23:22
1

Alternatively, you can bypass DTD validation using the settings argument:

// Files with a <!DOCTYPE ...> get parsed with a super expensive validation; ignore it.
XmlReaderSettings settings = new XmlReaderSettings();
settings.ValidationType = ValidationType.None;
settings.XmlResolver = null;
settings.DtdProcessing = DtdProcessing.Ignore;

XmlReader reader = XmlReader.Create(filePath, settings);
XmlDocument doc = new XmlDocument();
doc.Load(reader);