why is LZMA SDK (7-zip) so slow

Question

I found 7-zip great and I will like to use it on .net applications. I have a 10MB file (a.001) and it takes:

enter image description here

2 seconds to encode.

Now it will be nice if I could do the same thing on c#. I have downloaded http://www.7-zip.org/sdk.html LZMA SDK c# source code. I basically copied the CS directory into a console application in visual studio: enter image description here

Then I compiled and eveything compiled smoothly. So on the output directory I placed the file a.001 which is 10MB of size. On the main method that came on the source code I placed:

[STAThread]
    static int Main(string[] args)
    {
        // e stands for encode
        args = "e a.001 output.7z".Split(' '); // added this line for debug

        try
        {
            return Main2(args);
        }
        catch (Exception e)
        {
            Console.WriteLine("{0} Caught exception #1.", e);
            // throw e;
            return 1;
        }
    }

when I execute the console application the application works great and I get the output a.7z on the working directory. The problem is that it takes so long. It takes about 15 seconds to execute! I have also tried https://stackoverflow.com/a/8775927/637142 approach and it also takes very long. Why is it 10 times slower than the actual program ?

Also

Even if I set to use only one thread: enter image description here

It still takes much less time (3 seconds vs 15):

(Edit) Another Possibility

Could it be because C# is slower than assembly or C ? I notice that the algorithm does a lot of heavy operations. For example compare these two blocks of code. They both do the same thing:

C

#include <time.h>
#include<stdio.h>

void main()
{
    time_t now; 

    int i,j,k,x;
    long counter ;

    counter = 0;

    now = time(NULL);

    /* LOOP  */
    for(x=0; x<10; x++)
    {
        counter = -1234567890 + x+2;

        for (j = 0; j < 10000; j++)     
            for(i = 0; i< 1000; i++)                
                for(k =0; k<1000; k++)
                {
                    if(counter > 10000)
                        counter = counter - 9999;
                    else
                        counter= counter +1;
                }

        printf (" %d  \n", time(NULL) - now); // display elapsed time
    }


    printf("counter = %d\n\n",counter); // display result of counter        

    printf ("Elapsed time = %d seconds ", time(NULL) - now);
    gets("Wait");
}

output

enter image description here

c#

static void Main(string[] args)
{       
    DateTime now;

    int i, j, k, x;
    long counter;

    counter = 0;

    now = DateTime.Now;

    /* LOOP  */
    for (x = 0; x < 10; x++)
    {
        counter = -1234567890 + x + 2;

        for (j = 0; j < 10000; j++)            
            for (i = 0; i < 1000; i++)                
                for (k = 0; k < 1000; k++)
                {
                    if (counter > 10000)
                        counter = counter - 9999;
                    else
                        counter = counter + 1;
                }


        Console.WriteLine((DateTime.Now - now).Seconds.ToString());            
    }

    Console.Write("counter = {0} \n", counter.ToString());
    Console.Write("Elapsed time = {0} seconds", DateTime.Now - now);
    Console.Read();
}

Output

enter image description here

Note how much slower was c#. Both programs where run from outside visual studio on release mode. Maybe that is the reason why it takes so much longer in .net than on c++.

Also I got the same results. C# was 3 times slower just like on the example I just showed!

Conclusion

I cannot seem to know what is causing the problem. I guess I will use 7z.dll and invoke the necessary methods from c#. A library that does that is at: http://sevenzipsharp.codeplex.com/ and that way I am using the same library that 7zip is using as:

    // dont forget to add reference to SevenZipSharp located on the link I provided
    static void Main(string[] args)
    {
        // load the dll
        SevenZip.SevenZipCompressor.SetLibraryPath(@"C:\Program Files (x86)\7-Zip\7z.dll");

        SevenZip.SevenZipCompressor compress = new SevenZip.SevenZipCompressor();

        compress.CompressDirectory("MyFolderToArchive", "output.7z");


    }

Could it be because C# is much slower than asm or C ? I love c# but I don't think it is anytime as fast as languages such as assembly or C. — Tono Nam, Sep 06 '12 at 03:40
Is there any improvement if you give the app a warmup period (like run 5 iterations, and then profile). — Tim M., Sep 06 '12 at 03:57
Think you posted the wrong screenshot for c# (I'm curious about the results). — Tim M., Sep 08 '12 at 03:15
True I posted the wrong image I will run it again and show the results. — Tono Nam, Sep 08 '12 at 03:39
Interesting...any chance you can post the c++ .exe somewhere? I'd be interested to run your exact copy locally (and compile the c# app locally and play around with it). — Tim M., Sep 08 '12 at 04:27
Yeah both projects are at: https://dl.dropbox.com/u/81397375/Demo.zip compile both projects in release mode and run them outside visual studio. Let me know your results! — Tono Nam, Sep 08 '12 at 04:47
Cpp 44 seconds, c# 153 seconds. I expect that cpp would beat c# on some things, but this exercise amounts to simple assembly instructions with primitive types (even the IL code demonstrates this). I doubt the cpp compiler is optimizing the loops away, otherwise it would execute instantly (nor would that explain the 7z performance discrepancies). I'd love to know what the difference is; could be insightful for tuning .Net apps. — Tim M., Sep 08 '12 at 05:25
Maybe it could be because .net checks for so many things. For example if counter overflow I will get an exception in c#. On the other hand, on Cpp it will continue running. I tried wrapping the whole c# code in an unchecked block ( `unchecked { .... }` hoping it will get faster but it didn't. — Tono Nam, Sep 08 '12 at 05:30
Yeah, I tried that too...changed platform target, removed bounds checking, changed the long to an int (which removed some IL instructions but didn't give a noticeable improvement) — Tim M., Sep 08 '12 at 05:30
For the second example try RyuJIT CTP4, you will see it Works on equal time with CPP — Onur Gumus, May 24 '14 at 02:46

score 11 · Answer 1 · answered Jul 11 '13 at 16:14

I ran a profiler on the code, and the most expensive operation appears to be in searching for matches. In C#, it's searching a single byte at a time. There are two functions (GetMatches and Skip) in LzBinTree.cs which contain the following code snippet, and it spends something like 40-60% of its time on this code:

if (_bufferBase[pby1 + len] == _bufferBase[cur + len])
{
    while (++len != lenLimit)
        if (_bufferBase[pby1 + len] != _bufferBase[cur + len])
            break;

It's basically trying to find the match length a single byte at a time. I extracted that into its own method:

if (GetMatchLength(lenLimit, cur, pby1, ref len))
{

And if you use unsafe code and cast the byte* to a ulong* and compare 8 bytes at a time instead of 1, the speed almost doubled for my test data (in a 64 bit process):

private bool GetMatchLength(UInt32 lenLimit, UInt32 cur, UInt32 pby1, ref UInt32 len)
{
    if (_bufferBase[pby1 + len] != _bufferBase[cur + len])
        return false;
    len++;

    // This method works with or without the following line, but with it,
    // it runs much much faster:
    GetMatchLengthUnsafe(lenLimit, cur, pby1, ref len);

    while (len != lenLimit
        && _bufferBase[pby1 + len] == _bufferBase[cur + len])
    {
        len++;
    }
    return true;
}

private unsafe void GetMatchLengthUnsafe(UInt32 lenLimit, UInt32 cur, UInt32 pby1, ref UInt32 len)
{
    const int size = sizeof(ulong);
    if (lenLimit < size)
        return;
    lenLimit -= size - 1;
    fixed (byte* p1 = &_bufferBase[cur])
    fixed (byte* p2 = &_bufferBase[pby1])
    {
        while (len < lenLimit)
        {
            if (*((ulong*)(p1 + len)) == *((ulong*)(p2 + len)))
            {
                len += size;
            }
            else
                return;
        }
    }
}

On my sample (x64) workload (161 mb) overall compression time went from 142.19s to 140.29s (1.3% better). Profiler showed that the above changed the time in BinTree.Skip by -45% and BinTree.GetMatches by +3%. — Joseph Kingry, Mar 27 '14 at 16:27
So how does the managed code compare to unmanaged code for that same data? If that's not the bottleneck for the data you're testing, maybe you could find other bottlenecks that could be improved. — Bryce Wagner, Mar 28 '14 at 23:32

score 9 · Accepted Answer · answered Sep 13 '12 at 19:10

This kind of binary-arithmetic and branching-heavy code is what C-compilers love and what the .NET JIT hates. The .NET JIT is not a very smart compiler. It is optimized for fast compilation. If Microsoft wanted to tune it for maximum performance they would plug in the VC++ backend, but then intentionally don't.

Also, I can tell by the speed you are getting with 7z.exe (6MB/s) that you are using multiple cores, probably using LZMA2. My fast core i7 can deliver 2MB/s per core so I guess 7z.exe is running multi-threaded for you. Try turning on threading in the 7zip-library if that is possible.

I recommend that instead of using the managed code LZMA-algorithm you either use a natively compiled library or call 7z.exe using Process.Start. The latter one should get you started very quickly with good results.

Maciek Talaska · Answer 3 · 2012-09-06T03:59:00.853

3

I haven't used LZMA SDK myself, but I am pretty sure that by default 7-zip is running most of the operations on many threads. As I haven't done it myself the only thing I may suggest is to check if it is possible to force it to use many threads (if it is not used by default).

Edit:

As it seems that threading may not be (the only) performance related problem, there are others I could think of:

Have you checked that you've set the very same options as you're setting when using 7-zip UI? Is the output file of the same size? If not - it may happen that one compression method is much more faster than the other one.
Are you executing your application from withing VS or not? If so - this could add some overhead too (but I guess it should not result in an app running 5 times slower).
Are there any other operations taking place before compressing the file?

edited Sep 06 '12 at 03:59

answered Sep 06 '12 at 03:45

Maciek Talaska

1,628
1
14
22

+1 Thanks! I build it on release then executed it outside of visual studio and it came down to 5 seconds! But still there is a high difference. For example compressing a 40MB file takes 21 seconds on c# and on the real program it takes 6 seconds. I say the ratio is 1:3 which in my opinion is still a lot :( – Tono Nam Sep 06 '12 at 04:12
Yes, unfortunately the difference is still significant. Have you checked that both files (output files - ie. .7z) are the same (same size, binary same content)? if not - check the options and make sure that you're compressing using the same methods (as well as dictionary size etc.) for example: 'store only' or 'minimum compression' are very fast, but do not compress very well, and 'max compression' is quite the opposite. – Maciek Talaska Sep 06 '12 at 04:17
Yeah for some reason it is hard to do that with code. If I set the compression to Ultra (highest posible) and set 1 thread it takes 6 seconds. If I then shift the word size to 786 MB it takes 21 seconds so maybe that is the problem. I will try to set the word size on my program somehow and let you know – Tono Nam Sep 06 '12 at 04:25
The documentation of 7-zip shows the defauls as: `dictionary - size [0, 29], default: 23 (8MB), umber of fast bytes - [5, 273], default: 128` etc... I set the same ratios on 7zip and it is 3 times faster. I guess it has to do with the program... – Tono Nam Sep 06 '12 at 04:35
1

Try to test the operation twice. When C#-generated MSIL code gets executed is JIT-compiled (which takes time). Also the dll libraries may be read from the disk. Both operations are lengthy. Subsequent calls to the same methods will not cause reading dlls from disk and JIT compilation. – Artemix Sep 11 '12 at 13:13

score 3 · Answer 4 · answered Sep 13 '12 at 18:44

I've just taken a look at the LZMA CS implementation, and it's all performed in managed code. Having recently done some investigation into this for a compression requirement on my current project, most implementations of compression in managed code seem to perform less efficiently than in native.

I can only presume that this is the cause of the problem here. If you take a look at the performance table for another compression tool, QuickLZ, you can see the difference in performance between native and managed code (whether that be C# or Java).

Two options come to mind: use .NET's interop facilities to call a native compression method, or if you can afford to sacrifice compression size, take a look at http://www.quicklz.com/.

score 3 · Answer 5 · answered Sep 14 '12 at 21:42

Another alternative is to use SevenZipSharp (available on NuGet) and point it to your 7z.dll. Then your speeds should be about the same:

var libPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.ProgramFiles), "7-zip", "7z.dll");
SevenZip.SevenZipCompressor.SetLibraryPath(libPath);
SevenZip.SevenZipCompressor compressor = new SevenZipCompressor();
compressor.CompressFiles(compressedFile, new string[] { sourceFile });

score 2 · Answer 6 · answered Sep 14 '12 at 20:23

.net runtime is slower than native instructions. If something goes wrong in c, we usually have an application crash with blue screen of death. But in c# it doesn't, because whatever checks that we don't make in c, are actually added in c#. Without putting extra check for null, runtime can never catch null pointer exception. Without checking for index and length, runtime can never catch out of bounds exception.

These are implicit instructions before every instruction that makes .net runtime slow. In typical business apps, we don't care for performance, where complxicity of business and ui logic are more important, that's why .net runtime guards every instruction with extra care that lets us debug and resolve issues quickly.

Native c programs will always be faster then .net runtime, but they are hard to debug and need in depth knowledge of c to write correct code. Because c will execute everything, but will not give you any exception or clue of what went wrong.

why is LZMA SDK (7-zip) so slow

Also

(Edit) Another Possibility

C

c#

Also I got the same results. C# was 3 times slower just like on the example I just showed!

Conclusion

6 Answers6