17

Whilst benchmarking a real-world application I came across a surprising performance characteristic relating to the zlib and zip libraries that ship with Delphi.

My real-world application exports .xlsx files. This file format is a collection of XML files wrapped in a ZIP container file. The .xlsx export code generates the XML files and then feeds them to the Delphi ZIP library. Once I had optimised the XML file generation to the point where the ZIP creation was the bottleneck I discovered, to my surprise, that 64 bit code was significantly slower than 32 bit code.

In order to study this further I created this test program:

program zlib_perf;

{$APPTYPE CONSOLE}

uses
  System.SysUtils, System.Classes, System.Diagnostics, System.Zip;

const
  LoremIpsum =
    'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod '+
    'tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, '+
    'quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo '+
    'consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse '+
    'cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat '+
    'non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.';

function GetTestStream: TStream;
var
  Bytes: TBytes;
begin
  Result := TMemoryStream.Create;
  // fill the stream with 500MB of lorem ipsum
  Bytes := TEncoding.UTF8.GetBytes(LoremIpsum);
  while Result.Size < 500*1024*1024 do
    Result.WriteBuffer(Pointer(Bytes)^, Length(Bytes));
end;

procedure DoTest;
var
  DataStream, ZipStream: TStream;
  Stopwatch: TStopwatch;
  Zip: TZipFile;
begin
  DataStream := GetTestStream;
  try
    ZipStream := TMemoryStream.Create;
    try
      Zip := TZipFile.Create;
      try
        Zip.Open(ZipStream, zmWrite);

        Stopwatch := TStopwatch.StartNew;
        DataStream.Position := 0;
        Zip.Add(DataStream, 'foo');
        Writeln(Stopwatch.ElapsedMilliseconds);
      finally
        Zip.Free;
      end;
    finally
      ZipStream.Free;
    end;
  finally
    DataStream.Free;
  end;
end;

begin
  DoTest;
end.

I compiled the program under both XE2 and XE7, for both 32 and 64 bit, and with default release configuration compiler options. My test machine runs Windows 7 x64 on an Intel Xeon E5530.

Here are the results:

Compiler  Target  Time (ms)
     XE2   Win32       8586
     XE2   Win64      18908
     XE7   Win32       8583
     XE7   Win64      19304

I compressed the same file using the Explorer shell ZIP functionality and my rough stop watch timing was 8 seconds so the 32 bit times above seem reasonable.

Since the compression algorithm used by the above code is zlib (Delphi's ZIP code supports only store and deflate), my belief is that the zlib library used by Delphi is at the root of this issue. Why is Delphi's zlib library so slow under 64 bit?

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490

1 Answers1

22

As noted, the Delphi ZIP compression code stands on top of zlib. The Delphi implementation of zlib is a wrapper around the official zlib C source code. The C code is compiled to objects and then linked with {$LINK}. For XE7, the comments at the top of System.ZLib indicate that zlib 1.2.8 was used.

Under the obvious assumption that the time is being spent inside the zlib code, the most plausible explanation for the behaviour is that the 64 bit compiled objects are responsible for the poor performance. Either the compiler used is emitting weak code, or a poor choice of compiler options has been used.

So, I took the following steps:

  1. I downloaded the source for zlib 1.2.8 and compiled with the Microsoft 64 bit compiler, cl.
  2. Using the VS2010 compiler, version 16.00.30319.01. I compiled the objects with the following options: /O2 /GS-.
  3. I then took a copy of System.ZLib.pas and included it in my project, alongside the newly compiled objects. This ensures that the newly compiled zlib objects are used.
  4. I compiled the Delphi program with XE7 for 64 bit.

The run time, on the same machine as used to generate the data in the question was 6,912ms.

I then recompiled and omitted the /O2 option and went round the loop again. This time the run time was 20,077ms. So I hypothesise that Embarcadero have just been forgetting to compile these objects with optimisations.

I have reported this issue to Embarcadero's Quality Portal: https://quality.embarcadero.com/browse/RSP-9891

As mentioned in a comment below, it seems quite plausible that other libraries that rely on compiled objects may have similar problems. Potential problem areas include:

  • MidasLib, objects are probably not performance critical.
  • Indy, the version shipped with Delphi uses the same zlib objects I believe.
  • System.RegularExpressions, a wrapper around PCRE.
  • Vcl.Imaging.jpeg, built on top of a 3rd party JPEG implementation that is linked as compiled objects.

Update

The Quality Portal issue reports that this issue was fixed in XE8.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • I presume that you will file QP report with your findings? – Dalija Prasnikar Jan 07 '15 at 14:12
  • 2
    @DalijaPrasnikar I'm doing that now. Chicken and egg. I want to refer to the QP in the answer, and refer to the SO post in the QP! The reason I posted Q&A here is to hopefully benefit other developers who may not find the QP report. – David Heffernan Jan 07 '15 at 14:19
  • 2
    Indeed in `.../Studio/15.0/lib/win32/debug` and `.../win32/release` there are separate versions of the zlib `.obj` files, the latter being smaller and seemingly compiled with optimizations (CodeGear C++ 6.90 compiler). In `.../win64/debug`, however, there are no `.obj` files and in `.../win64/release` the `.obj` files seem to be compiled in debug mode (MS Optimizing compiler - keyword `debug` in cleartext at offset `0x3D` in the compiled file). This suggests that other libraries may also be affected. – J... Jan 07 '15 at 14:31
  • 1
    `sqlite3_x64.obj` is also among those suspect...that could be a big one for some. – J... Jan 07 '15 at 14:43
  • 1
    @J... That thought (other libraries suffering the same problem) had occurred to me too. I was conducting a a search for other libraries as your comment came in. I've updated the answer to point that out. Thanks! Upd: As for sqlite, my edition (pro) has no sqlite so I'm not qualified to comment. Please do feel free to add more in an edit if you are inclined to do so. – David Heffernan Jan 07 '15 at 14:43
  • @DavidHeffernan I'm also using pro - think it's maybe part of dbExpress? I do have FireDAC, though, so maybe that's where it comes from. At least support for local SQLite seems part of pro...don't use it myself. – J... Jan 07 '15 at 15:04
  • 2
    Just want to report that this is fixed as of Delphi 10 Seattle (actually the QP says they resolved it in XE 8). Tested just now with release builds: 32-bit gets 3874 and 64-bit gets 3308, so 64-bit is a bit faster than 32-bit now. – Brandon Staggs Oct 28 '15 at 16:54