32

We have reasonably large XML strings which we currently parse using MSXML2

I have just tried using MSXML6 hoping for a speed improvement and have got nothing!

We currently create a lot of DOM Documents and I guess there may be some overhead in constantly interacting with the MSXML2/6 dll

Does anyone know of a better/faster XML component for Delphi?

If anyone can suggest an alternative, and it is faster, we would look to integrate it, but that would be a lot of work, so hopefully the structure would not be too different to that used by MSXML

We are using Delphi 2010

Paul

Paul
  • 2,773
  • 7
  • 41
  • 96
  • I see a mod has deleted the comments that contained valuable information. Why? – David Heffernan Feb 29 '12 at 22:57
  • @DavidHeffernan, was there still any useful info that wasn't part of any answers? Long comment threads are always subject to deletion if the reviewing moderator doesn't see the value. – Michael Myers Feb 29 '12 at 23:34
  • 7
    @MichaelMyers Well I thought there was useful information, but I guess the mods disagreed. I really don't see how deleting the comments makes things better. If someone is really interested in this question then they will take the time to read everything. – David Heffernan Mar 01 '12 at 22:46
  • 3
    I agree, it is a bit annoying that a lot of the comments were deleted. I found it all useful to get an overall idea. Luckily I hadnt closed a tab showing this page before the moderation so I was able to keep hold of some of the comments – Paul Mar 02 '12 at 09:06

5 Answers5

33

some time ago I had to serialize record to XML format; for ex:

 TTest = record
    a : integer;
    b : real; 
 end;

to

    <Data>
        <a type="tkInteger">value</a>
        <b type="tkFloat">value</b>
    </Data>

I used RTTI to recursively navigate through record fields and storing values to XML. I've tried few XML Parsers. I did't need DOM model to create xml, but needed it to load it back.

XML contained about 310k nodes (10-15MBytes); results presented in table below, there are 6 columns with time in seconds;
1 - time for creating nodes and write values
2 - SaveToFile();
3 = 1 + 2
4 - LoadFromFile();
5 - navigate through nodes and read values
6 = 4 + 5
enter image description here

MSXML/Xerces/ADOM - are differend vendors for TXMLDocument (DOMVendor)
JanXML doesn't work with unicode; I fixed some errors, and saved XML, but loading causes AV (or stack overflow, I don't remember);
manual - means manually writing XML using TStringStream.

I used Delphi2010, Win7x32, Q8200 CPU/2.3GHz, 4Gb of RAM.

update: You can download source code for this test (record serialization to XML using RTTI) here http://blog.karelia.pro/teran/files/2012/03/XMLTest.zip All parsers (Omni, Native, Jan) are included (now nodes count in XML is about 270k), sorry there are no comments in code.

teran
  • 3,214
  • 1
  • 21
  • 30
  • Nice numbers. I'll just add that NativeXML took specific steps to add a buffered stream to get good performance. I had a large file that was very slow, and afterwards it was as fast as a small file. The measurements above show it is good. – mj2008 Feb 29 '12 at 10:16
  • Note that unless we compare the implementations he made, it is possible that another person using any of the above tools, might have obtained significantly different performance. The SAVE time for OmniXML is 10x slower than EVERYBODY else there. That's suspicious to me. He might be using an expensive (wrong) way of using OmniXML, for instance. – Warren P Feb 29 '12 at 18:03
  • @WarrenP yes OmniXML save-time looks strange, tomorrow I'll upload sources. Actually "10x slower than everybody" is "8x slower than NativeXML" ;). But as I remember `save`-time is simply `SaveToFile`-call time. – teran Feb 29 '12 at 18:37
  • @WarrenP, i've added link to source code in the end of my answer. I've also checked - "Save" time = `SaveToFile` time – teran Mar 01 '12 at 07:51
  • Thanks teran. I think I might profile OmniXML with AQTime and figure out where all those seconds are going. – Warren P Mar 01 '12 at 19:05
  • @WarrenP: the problem is that OmniXML doesn't use buffering when writing to a stream. Thus direct writing to a TFileStream is very slow. Possible solution is to write to a memory stream first and then to file stream. – oxo Dec 12 '13 at 21:44
23

I know that it's an old question, but people might find it interesting:

I wrote a new XML library for Delphi (OXml): http://www.kluug.net/oxml.php

It features direct XML handling (read+write), SAX parser, DOM and a sequential DOM parser. One of the benefits is that OXml supports Delphi 6-Delphi XE5, FPC/Lazarus and C++Builder on all platforms (Win, MacOSX, Linux, iOS, Android).

OXml DOM is record/pointer based and offers better performance than any other XML library:

The read test returns the time the parser needs to read a custom XML DOM from a file (column "load") and to write node values to a constant dummy function (column "navigate"). The file is encoded in UTF-8 and it's size is about 5,6 MB.

XML parse comparison

The write test returns the time the parser needs to create a DOM (column "create") and write this DOM to a file (column "save"). The file is encoded in UTF-8 and it's size is about 11 MB.

XML write comparison

+ The poor OmniXML (original) writing performance was the result of the fact that OmniXML didn't use buffering for writing. Thus writing to a TFileStream was very slow. I updated OmniXML and added buffering support. You can get the latest OmniXML code from the SVN.

oxo
  • 946
  • 9
  • 21
12

Recently I had a similar issue where using the MSXML DOM parser proved to be too slow for the given task. I had to parse rather large documents > 1MB and the memory consumption of the DOM parser was prohibitive. My solution was to not use a DOM parser at all, but to go with the event driven MSXML SAX parser. This proved to be much, much faster. Unfortunately the programming model is totally different, but dependent on the task, it might be worth it. Craig Murphy has published an excellent article on how to use the MSXML SAX parser in delphi: SAX, Delphi and Ex Em El

Lars Frische
  • 329
  • 2
  • 4
  • 1
    In case of such quite effortful change and if validation is not required then maybe [XmlLite](http://msdn.microsoft.com/en-us/library/windows/desktop/ms752872%28v=vs.85%29.aspx) wil be an option. See XmlLite "Usage Scenarios" section of [XmlLite Introduction](http://msdn.microsoft.com/en-us/library/windows/desktop/ms752838%28v=vs.85%29.aspx) – g2mk Mar 01 '12 at 05:40
  • @Lars, the "SAX, Delphi and Ex Em El" link seems to be dead ;( – rossmcm May 22 '13 at 22:33
  • Then you are welcome to contribute to [The Internet Archive](https://archive.org/) because it helps here (again) keeping a [copy](https://web.archive.org/web/20101224015141/http://www.prototypical.co.uk/pdf/sax.pdf). – AntoineL Mar 27 '19 at 07:56
6

Someday I have written very simple XML test suite. It serves MSXML (D7 MSXML3?), Omni XML (bit old) and Jedi XML (latest stable).

Test results for 1,52 MB file:

XML file loading time MSXML: 240,20 [ms]

XML node selections MSXML: 1,09 [s]

XML file loading time OmniXML: 2,25 [s]

XML node selections OmniXML: 1,22 [s]

XML file loading time JclSimpleXML: 2,11 [s]

and access violation for JclSimpleXML node selections :|

Unfortunately I actually haven't much time to correct above AV, but sorces are contained below...

fmuMain.pas

program XmlEngines;

uses
  FastMM4,
  Forms,
  fmuMain in 'fmuMain.pas' {fmMain},
  uXmlEngines in 'uXmlEngines.pas',
  ifcXmlEngine in 'ifcXmlEngine.pas';

{$R *.res}

begin
  Application.Initialize;
  Application.Title := 'XML Engine Tester';
  Application.CreateForm(TfmMain, fmMain);
  Application.Run;
end.

fmuMain.pas

unit fmuMain;

interface

uses
  Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
  Dialogs, xmldom, XMLIntf, msxmldom, XMLDoc,
  //
  ifcXmlEngine, StdCtrls;

type
  TfmMain = class(TForm)
    mmoDebug: TMemo;
    dlgOpen: TOpenDialog;

    procedure FormCreate(Sender: TObject);
    procedure FormDestroy(Sender: TObject);

    procedure mmoDebugClick(Sender: TObject);

  private
    fXmlEngines: TInterfaceList;
    function Get_Engine(const aIx: Integer): IXmlEngine;

  protected
    property XmlEngine[const aIx: Integer]: IXmlEngine read Get_Engine;

    procedure Debug(const aInfo: string); // inline

  public
    procedure RegisterXmlEngine(const aEngine: IXmlEngine);

  end;

var
  fmMain: TfmMain;

implementation

{$R *.dfm}

uses
  uXmlEngines, TZTools;

{ TForm1 }

function TfmMain.Get_Engine(const aIx: Integer): IXmlEngine;
begin
  Result:= nil;
  Supports(fXmlEngines[aIx], IXmlEngine, Result)
end;

procedure TfmMain.RegisterXmlEngine(const aEngine: IXmlEngine);
var
  Ix: Integer;
begin
  if aEngine = nil then
    Exit; // WARRNING: program flow disorder

  for Ix:= 0 to Pred(fXmlEngines.Count) do
    if XmlEngine[Ix] = aEngine then
      Exit; // WARRNING: program flow disorder

  fXmlEngines.Add(aEngine)
end;

procedure TfmMain.FormCreate(Sender: TObject);
begin
  fXmlEngines:= TInterfaceList.Create();
  dlgOpen.InitialDir:= ExtractFileDir(ParamStr(0));
  RegisterXmlEngine(TMsxmlEngine.Create(Self));
  RegisterXmlEngine(TOmniXmlEngine.Create());
  RegisterXmlEngine(TJediXmlEngine.Create());
end;

procedure TfmMain.mmoDebugClick(Sender: TObject);

  procedure TestEngines(const aFilename: TFileName);

    procedure TestEngine(const aEngine: IXmlEngine);
    var
      PerfCheck: TPerfCheck;
      Ix: Integer;
    begin
      PerfCheck := TPerfCheck.Create();
      try

        PerfCheck.Init(True);
        PerfCheck.Start();
        aEngine.Load(aFilename);
        PerfCheck.Pause();
        Debug(Format(
          'XML file loading time %s: %s',
          [aEngine.Get_ID(), PerfCheck.TimeStr()]));

        if aEngine.Get_ValidNode() then
        begin
          PerfCheck.Start();
          for Ix:= 0 to 999999 do
            if aEngine.Get_ChildsCount() > 0 then
            begin

              aEngine.SelectChild(Ix mod aEngine.Get_ChildsCount());

            end
            else
              aEngine.SelectRootNode();

          PerfCheck.Pause();
          Debug(Format(
            'XML nodes selections %s: %s',
            [aEngine.Get_ID(), PerfCheck.TimeStr()]));
        end

      finally
        PerfCheck.Free();
      end
    end;

  var
    Ix: Integer;
  begin
    Debug(aFilename);
    for Ix:= 0 to Pred(fXmlEngines.Count) do
      TestEngine(XmlEngine[Ix])
  end;

var
  CursorBckp: TCursor;
begin
  if dlgOpen.Execute() then
  begin

    CursorBckp:= Cursor;
    Self.Cursor:= crHourGlass;
    mmoDebug.Cursor:= crHourGlass;
    try
      TestEngines(dlgOpen.FileName)
    finally
      Self.Cursor:= CursorBckp;
      mmoDebug.Cursor:= CursorBckp;
    end

  end
end;

procedure TfmMain.Debug(const aInfo: string);
begin
  mmoDebug.Lines.Add(aInfo)
end;

procedure TfmMain.FormDestroy(Sender: TObject);
begin
  fXmlEngines.Free()
end;

end.

ifcXmlEngine.pas

unit ifcXmlEngine;

interface

uses
  SysUtils;

type
  TFileName = SysUtils.TFileName;

  IXmlEngine = interface
    ['{AF77333B-9873-4FDE-A3B1-260C7A4D3357}']
    procedure Load(const aFilename: TFileName);
    procedure SelectRootNode();
    procedure SelectChild(const aIndex: Integer);
    procedure SelectParent();
    //
    function Get_ID(): string;
    function Get_ValidNode(): Boolean;
    function Get_ChildsCount(): Integer;
    function Get_HaveParent(): Boolean;
    //function Get_NodeName(): Boolean;
  end;

implementation

end.

uXmlEngines.pas

unit uXmlEngines;

interface

uses
  Classes,
  //
  XMLDoc, XMLIntf, OmniXml, JclSimpleXml,
  //
  ifcXmlEngine;

type
  TMsxmlEngine = class(TInterfacedObject, IXmlEngine)
  private
    fXmlDoc: XMLDoc.TXMLDocument;
    fNode: XMLIntf.IXMLNode;

  protected

  public
    constructor Create(const aOwner: TComponent);
    destructor Destroy; override;

    procedure Load(const aFilename: TFileName);
    procedure SelectRootNode();
    procedure SelectChild(const aIndex: Integer);
    procedure SelectParent();
    //
    function Get_ID(): string;
    function Get_ValidNode(): Boolean;
    function Get_ChildsCount(): Integer;
    function Get_HaveParent(): Boolean;
    //function Get_NodeName(): Boolean;

  end;

  TOmniXmlEngine = class(TInterfacedObject, IXmlEngine)
  private
    fXmlDoc: OmniXml.IXmlDocument;
    fNode: OmniXml.IXMLNode;

  protected

  public
    constructor Create;
    destructor Destroy; override;

    procedure Load(const aFilename: TFileName);
    procedure SelectRootNode();
    procedure SelectChild(const aIndex: Integer);
    procedure SelectParent();
    //
    function Get_ID(): string;
    function Get_ValidNode(): Boolean;
    function Get_ChildsCount(): Integer;
    function Get_HaveParent(): Boolean;
    //function Get_NodeName(): Boolean;

  end;

  TJediXmlEngine = class(TInterfacedObject, IXmlEngine)
  private
    fXmlDoc: TJclSimpleXML;
    fNode: TJclSimpleXMLElem;

  protected

  public
    constructor Create();
    destructor Destroy(); override;

    procedure Load(const aFilename: TFileName);
    procedure SelectRootNode();
    procedure SelectChild(const aIndex: Integer);
    procedure SelectParent();
    //
    function Get_ID(): string;
    function Get_ValidNode(): Boolean;
    function Get_ChildsCount(): Integer;
    function Get_HaveParent(): Boolean;
    //function Get_NodeName(): Boolean;

  end;

implementation

uses
  SysUtils;

{ TMsxmlEngine }

constructor TMsxmlEngine.Create(const aOwner: TComponent);
begin
  if aOwner = nil then
    raise Exception.Create('TMsxmlEngine.Create() -> invalid owner');

  inherited Create();
  fXmlDoc:= XmlDoc.TXmlDocument.Create(aOwner);
  fXmlDoc.ParseOptions:= [poPreserveWhiteSpace]
end;

destructor TMsxmlEngine.Destroy;
begin
  fXmlDoc.Free();
  inherited Destroy()
end;

function TMsxmlEngine.Get_ChildsCount: Integer;
begin
  Result:= fNode.ChildNodes.Count
end;

function TMsxmlEngine.Get_HaveParent: Boolean;
begin
  Result:= fNode.ParentNode <> nil
end;

function TMsxmlEngine.Get_ID: string;
begin
  Result:= 'MSXML'
end;

//function TMsxmlEngine.Get_NodeName: Boolean;
//begin
//  Result:= fNode.Text
//end;

function TMsxmlEngine.Get_ValidNode: Boolean;
begin
  Result:= fNode <> nil
end;

procedure TMsxmlEngine.Load(const aFilename: TFileName);
begin
  fXmlDoc.LoadFromFile(aFilename);
  SelectRootNode()
end;

procedure TMsxmlEngine.SelectChild(const aIndex: Integer);
begin
  fNode:= fNode.ChildNodes.Get(aIndex)
end;

procedure TMsxmlEngine.SelectParent;
begin
  fNode:= fNode.ParentNode
end;

procedure TMsxmlEngine.SelectRootNode;
begin
  fNode:= fXmlDoc.DocumentElement
end;

{ TOmniXmlEngine }

constructor TOmniXmlEngine.Create;
begin
  inherited Create();
  fXmlDoc:= OmniXml.TXMLDocument.Create();
  fXmlDoc.PreserveWhiteSpace:= true
end;

destructor TOmniXmlEngine.Destroy;
begin
  fXmlDoc:= nil;
  inherited Destroy()
end;

function TOmniXmlEngine.Get_ChildsCount: Integer;
begin
  Result:= fNode.ChildNodes.Length
end;

function TOmniXmlEngine.Get_HaveParent: Boolean;
begin
  Result:= fNode.ParentNode <> nil
end;

function TOmniXmlEngine.Get_ID: string;
begin
  Result:= 'OmniXML'
end;

//function TOmniXmlEngine.Get_NodeName: Boolean;
//begin
//  Result:= fNode.NodeName
//end;

function TOmniXmlEngine.Get_ValidNode: Boolean;
begin
  Result:= fNode <> nil
end;

procedure TOmniXmlEngine.Load(const aFilename: TFileName);
begin
  fXmlDoc.Load(aFilename);
  SelectRootNode()
end;

procedure TOmniXmlEngine.SelectChild(const aIndex: Integer);
begin
  fNode:= fNode.ChildNodes.Item[aIndex]
end;

procedure TOmniXmlEngine.SelectParent;
begin
  fNode:= fNode.ParentNode
end;

procedure TOmniXmlEngine.SelectRootNode;
begin
  fNode:= fXmlDoc.DocumentElement
end;

{ TJediXmlEngine }

constructor TJediXmlEngine.Create;
begin
  inherited Create();
  fXmlDoc:= TJclSimpleXML.Create();
end;

destructor TJediXmlEngine.Destroy;
begin
  fXmlDoc.Free();
  inherited Destroy()
end;

function TJediXmlEngine.Get_ChildsCount: Integer;
begin
  Result:= fNode.ChildsCount
end;

function TJediXmlEngine.Get_HaveParent: Boolean;
begin
  Result:= fNode.Parent <> nil
end;

function TJediXmlEngine.Get_ID: string;
begin
  Result:= 'JclSimpleXML';
end;

//function TJediXmlEngine.Get_NodeName: Boolean;
//begin
//  Result:= fNode.Name
//end;

function TJediXmlEngine.Get_ValidNode: Boolean;
begin
  Result:= fNode <> nil
end;

procedure TJediXmlEngine.Load(const aFilename: TFileName);
begin
  fXmlDoc.LoadFromFile(aFilename);
  SelectRootNode()
end;

procedure TJediXmlEngine.SelectChild(const aIndex: Integer);
begin
  fNode:= fNode.Items[aIndex]
end;

procedure TJediXmlEngine.SelectParent;
begin
  fNode:= fNode.Parent
end;

procedure TJediXmlEngine.SelectRootNode;
begin
  fNode:= fXmlDoc.Root
end;

end.
g2mk
  • 509
  • 3
  • 13
3

Give a try to himXML by himitsu.

It is released under MPL v1.1 , GPL v3.0 or LGPL v3.0 license.

You will have to register to the Delphi-Praxis (german) excellent Delphi site so as to be able to download:

It has a very impressive performance and the distribution includes demos demonstrating that. I've successfully used it in Delphi 2007, Delphi 2010 and Delphi XE.

menjaraz
  • 7,551
  • 4
  • 41
  • 81