2

I am writing a DataTable to xml with the DataTable.WriteXml()-method. This takes a few seconds, which I think is very good.

However, reading the file (38MB size, 26000rows, 36 columns) using the ReadXml()-method takes 3 minutes! I am already using the DataTable.BeginLoadData() method before calling the ReadXml (which made it ~10 seconds faster), but is there anything else I can do to improve performance?

Here is the loading method:

            data = new DataTable();
            data.BeginLoadData();
            data.ReadXml(filedialog.FileName);
            data.EndLoadData();

I have used breakpoints and stopwatch to see that it is actually the ReadXml method that is the time consuming part.

Here is the XML scheme:

<?xml version="1.0" standalone="yes"?>
<xs:schema id="NewDataSet" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xs:element name="NewDataSet" msdata:IsDataSet="true" msdata:MainDataTable="ReportTable" msdata:UseCurrentLocale="true">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element name="ReportTable">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="Task" type="xs:int" minOccurs="0" />
              <xs:element name="Project" type="xs:string" minOccurs="0" />
              <xs:element name="DP" type="xs:string" minOccurs="0" />
              <xs:element name="Requester" type="xs:string" minOccurs="0" />
              <xs:element name="TaskType" type="xs:string" minOccurs="0" />
              <xs:element name="OriginalMethod" type="xs:string" minOccurs="0" />
              <xs:element name="Method" type="xs:string" minOccurs="0" />
              <xs:element name="Country" type="xs:string" minOccurs="0" />
              <xs:element name="DayPosted" type="xs:string" minOccurs="0" />
              <xs:element name="DayFinished" type="xs:string" minOccurs="0" />
              <xs:element name="Posted" type="xs:dateTime" minOccurs="0" />
              <xs:element name="JustifiedPosted" type="xs:dateTime" minOccurs="0" />
              <xs:element name="Due" type="xs:dateTime" minOccurs="0" />
              <xs:element name="DueInSec" type="xs:double" minOccurs="0" />
              <xs:element name="AvailableDPWorkTime" type="xs:double" minOccurs="0" />
              <xs:element name="AvailableWorkHours" type="xs:string" minOccurs="0" />
              <xs:element name="StartedDate" type="xs:dateTime" minOccurs="0" />
              <xs:element name="FinishDate" type="xs:dateTime" minOccurs="0" />
              <xs:element name="SecUsed" type="xs:double" minOccurs="0" />
              <xs:element name="MinsUsed" type="xs:double" minOccurs="0" />
              <xs:element name="HoursUsed" type="xs:double" minOccurs="0" />
              <xs:element name="TimeUsed" type="xs:string" minOccurs="0" />
              <xs:element name="Month" type="xs:string" minOccurs="0" />
              <xs:element name="Experience" type="xs:int" minOccurs="0" />
              <xs:element name="Delay" type="xs:string" minOccurs="0" />
              <xs:element name="SecDelay" type="xs:int" minOccurs="0" />
              <xs:element name="TimeDelayed" type="xs:string" minOccurs="0" />
              <xs:element name="PostedVsDue" type="xs:string" minOccurs="0" />
              <xs:element name="Modified" type="xs:dateTime" minOccurs="0" />
              <xs:element name="AvailableDPModWorkTime" type="xs:double" minOccurs="0" />
              <xs:element name="ModifiedAvailableWorkHours" type="xs:string" minOccurs="0" />
              <xs:element name="MasterProject" type="xs:string" minOccurs="0" />
              <xs:element name="Year" type="xs:string" minOccurs="0" />
              <xs:element name="Department" type="xs:string" minOccurs="0" />
              <xs:element name="Customer" type="xs:string" minOccurs="0" />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>
Kjetil
  • 484
  • 4
  • 15
  • You'll have to show some code, including the schema of your table and which overload of ReadXml() you're calling. I've run a quick test and not noticed any performance problems like this. – Tim Rogers Sep 08 '11 at 12:10
  • I have edited my original post. I hope I have not misunderstood what you meant by schema. – Kjetil Sep 09 '11 at 06:11

1 Answers1

2

I had same problem long ago, solved by setting EnforceConstraints to false, readXml then set it back to true.

Try it out.

Here the docs http://msdn.microsoft.com/en-us/library/s3bxwk8b(v=vs.80).aspx

EDIT:

user is loading data into a DataTable directly, suggested also to load data in a stream so to isolate network/disk performance issue from the pure call to data.ReadXml(...);

ReadXml has aso an overload to read from a stream which could be used after data has been loaded in a Stream.

Davide Piras
  • 43,984
  • 10
  • 98
  • 147
  • I am doing the ReadXml directly to a DataTable. The DataTable does not belong to a DataSet. – Kjetil Sep 09 '11 at 08:34
  • 1
    Ok, seen it, is the file you load on the network or on the local machine? You could try to first load the file in a stream and see how long it takes then you know if the problem is disk/network or data loading. – Davide Piras Sep 09 '11 at 08:40
  • From a local drive, 34739233 bytes took no time at all with a filestream (read it into a byte array for testing). So, instead of using the ReadXml(string FilePath) overload, I tested the ReadXml(Stream filestream) overload. The time it takes to load is now only 20 seconds! It is good enough for me. Should I answer my own question with what I've figured out? – Kjetil Sep 09 '11 at 09:06
  • how much time was before? I suggested you the stream usage :) – Davide Piras Sep 09 '11 at 09:11
  • I will accept this as the answer then.. you should probably edit your answer to reflect upon what I discovered and wrote in comment to the answer, because what you wrote there had nothing to do with the real answer. (I wasn't even using DataSet!) – Kjetil Sep 09 '11 at 10:45