4

I've searched a lot but I couldn't find a propper solution for my problem. I wrote a xml file containing all episode information of a TV-Show. It's 38 kb and contains attributes and strings for about 680 variables. At first I simply read it with the help of XMLTextReader which worked fine with my quadcore. But my wifes five year old laptop took about 30 seconds to read it. So I thought about multithreading but I get an exception because the file is already opened.

Thread start looks like this

while (reader.Read())
{
   ...
   else if (reader.NodeType == XmlNodeType.Element)
   {
       if (reader.Name.Equals("Season1"))
       {
           current.seasonNr = 0;
           current.currentSeason = season[0];
           current.reader = reader;
           seasonThread[0].Start(current);
       }
       else if (reader.Name.Equals("Season2"))
       {
           current.seasonNr = 1;
           current.currentSeason = season[1];
           current.reader = reader;
           seasonThread[1].Start(current);
       }

And the parsing method like this

reader.Read();

for (episodeNr = 0; episodeNr < tmp.currentSeason.episode.Length; episodeNr++)
{
    reader.MoveToFirstAttribute();
    tmp.currentSeason.episode[episodeNr].id = reader.ReadContentAsInt();
    ...
}

But it doesn't work...

I pass the reader because I want the 'cursor' to be in the right position. But I also have no clue if this could work at all.

Please help!

EDIT: Guys where did I wrote about IE?? The program I wrote parses the file. I run it on my PC and on the laptop. No IE at all.

EDIT2: I did some stopwatch research and figured out that parsing the xml file only takes about 200ms on my PC and 800ms on my wifes laptop. Is it WPF beeing so slow? What can I do?

theknut
  • 2,533
  • 5
  • 26
  • 41
  • 4
    I don't think that going to multi threading on 5 year old hardware will give you any performance grain. You should do some research on why it runs so slow. A 38kb file should be no problem for a 5 year old computer. Add some performance counters and see what takes so long. 700 variables aren't so many... – Peter Jun 25 '11 at 18:31
  • I thought about XMLTextReader... is it slow? There are so many other possibilities to read XML files. – theknut Jun 25 '11 at 18:37
  • I/O is what takes so long. Time how quickly it loads into your wife's Internet Explorer versus yours. It will be noticeably slower in the laptop's browser. – IAbstract Jun 25 '11 at 18:42
  • @theknut, that depends on what exactly you are doing, but as long as you're reading the file only once, it should be at least as fast as other ways to read a XML file in .Net. – svick Jun 25 '11 at 18:44
  • @IAbstract, I don't think loading 38kB file should be noticeable, even on an old laptop. – svick Jun 25 '11 at 18:46
  • @svick: not a scientific means, no ... but I have been surprised on occasion at how Internet Explorer, specifically, would be slow at loading an .xml file - especially where you and I would see 38kB as insignificant... – IAbstract Jun 25 '11 at 18:48
  • Guys where did I wrote about IE?? The program I wrote parses the file. I run it on my PC and on the laptop. No IE at all. – theknut Jun 25 '11 at 19:02
  • 2
    @theknut: IE is simply a test apparatus I use to see how long Xml files take to load - especially since IE seems to have worst-case characteristics. – IAbstract Jun 26 '11 at 09:26

4 Answers4

3

I agree with most everyone's comments. Reading a 38Kb file should not take so long. Do you have something else running on the machine, antivirus / etc, that could be interfering with the processing?

The amount of time it would take you to create a thread will be far greater than the amount of time spent reading the file. If you could post the actual code used to read the file and the file itself, it might help analyze performance bottlenecks.

Shiroy
  • 258
  • 1
  • 8
  • but pls don't laugh... :-\ I should mention that I use WPF. But the app is quite small. The UI looks like this http://goo.gl/Ia67u – theknut Jun 26 '11 at 19:10
  • There doesn't seem to be anything glaringly wrong with your code. :( – Shiroy Jul 12 '11 at 18:26
1

I think you can't parse XML in multiple threads, at least not in a way that would bring performance benefits, because to read from some point in the file, you need to know everything that comes before it, if nothing else, to know at what level you are.

Your code, if tit worked, would do something like this:

main  season1  season2

read
read
skip   read
skip   read
read
skip             read
skip             read

Note that to do “skip”, you need to fully parse the XML, which means you're doing the same amount of work as before on the main thread. The only difference is that you're doing some additional work on the background threads.

Regarding the slowness, just parsing such a small XML file should be very fast. If it's slow, you're most likely doing something else that is slow, or you're parsing the file multiple times.

svick
  • 236,525
  • 50
  • 385
  • 514
0

If I am understanding how your .xml file is being used, you have essentially created an .xml database.

If correct, I would recommend breaking your Xml into different .xml files, with an indexed .xml document. I would think you can then query - using Linq-2-Xml - a set of .xml data from a specific .xml source.

Of course, this means you will still need to load an .xml file; however, you will be loading significantly smaller files and you would be able to, although highly discouraged, asynchronously load .xml document objects.

IAbstract
  • 19,551
  • 15
  • 98
  • 146
0

Your XML schema doesn't lend itself to parallelism since you seem to have node names (Season1, Season2) that contain the same data but must be parsed individually. You could redesign you schema to have the same node names (i.e. Season) and attributes that express the differences in the data (i.e. Number to indicate the season number). Then you can parallelize i.e. using Linq to XML and PLinq:

XDocument doc = XDocument.Load(@"TVShowSeasons.xml");
var seasonData = doc.Descendants("Season")
                    .AsParallel()
                    .Select(x => new Season()
                    {
                        Number = (int)x.Attribute("Number"),
                        Descripton = x.Value
                    }).ToList();
BrokenGlass
  • 158,293
  • 28
  • 286
  • 335
  • But this way, the actual parsing occurs in the `Load` method and is not parallel. Converting to `Season` objects should be trivial compared to that. – svick Jun 25 '11 at 19:28