Compression Performance Difference between swift and Managed(c#)

Question

I'm implementing LZF compress in managed memory environment and decompress from ios environment.

So this is my code implement lzf decompress like this in c#

private static int LZFDecompress(byte[] input, byte[] output)
{
    int inputLength = input.Length;
    int outputLength = output.Length;

    uint iidx = 0;
    uint oidx = 0;

    do
    {
        uint ctrl = input[iidx++];

        if (ctrl < (1 << 5)) /* literal run */
        {
            ctrl++;

            if (oidx + ctrl > outputLength)
            {
                //SET_ERRNO (E2BIG);
                return 0;
            }

            do
                output[oidx++] = input[iidx++];
            while ((--ctrl) != 0);
        }
        else /* back reference */
        {
            uint len = ctrl >> 5;

            int reference = (int)(oidx - ((ctrl & 0x1f) << 8) - 1);

            if (len == 7)
                len += input[iidx++];

            reference -= input[iidx++];

            if (oidx + len + 2 > outputLength)
            {
                //SET_ERRNO (E2BIG);
                return 0;
            }

            if (reference < 0)
            {
                //SET_ERRNO (EINVAL);
                return 0;
            }

            output[oidx++] = output[reference++];
            output[oidx++] = output[reference++];

            do
                output[oidx++] = output[reference++];
            while ((--len) != 0);
        }
    }
    while (iidx < inputLength);

    return (int)oidx;
}

and porting to swift like this

    private static func LZFDecompress(input: [UInt8],
                                      output: inout [UInt8])
        -> Int
    {
        let inputLength = input.count
        let outputLength = output.count
        
        var iidx = 0
        var oidx = 0
        
        repeat
        {
            var ctrl = Int(input[iidx])
            iidx += 1
            
            if ctrl < (1 << 5)
            {
                ctrl += 1
                
                if oidx + ctrl > outputLength
                {
                    return 0
                }
                
                repeat
                {
                    output[oidx] = input[iidx]
                    oidx += 1
                    iidx += 1
                    ctrl -= 1
                }
                while ctrl != 0
            }
            else
            {
                var len = ctrl >> 5
                var reference = oidx - ((ctrl & 0x1f) << 8) - 1
                
                if len == 7
                {
                    len += Int(input[iidx])
                    iidx += 1
                }
                
                reference -= Int(input[iidx])
                iidx += 1
                
                if oidx + len + 2 > outputLength
                {
                    return 0
                }
                
                if reference < 0
                {
                    return 0
                }
                
                output[oidx] = output[reference]
                oidx += 1
                reference += 1
                output[oidx] = output[reference]
                oidx += 1
                reference += 1
                
                repeat
                {
                    output[oidx] = output[reference]
                    oidx += 1
                    reference += 1
                    len -= 1
                }
                while len != 0
            }
        }
        while iidx < inputLength
        
        return oidx
    }

But I have a problem, it is a performance difference. It costs 2-3 seconds in c# but costs 9-10 seconds in swift to decompress same files... I can't understand this situation.

I tested c# from console in windows. And I tested swift from playground or project in mac.

Your `private struct OutputBuffer` does absolutely nothing of value, you should remove it and just pass `byte[] buffer` around directly. And it's dangerous to have `implicit` conversion operators defined for reference-types without gracefully handling nulls. — Dai, Mar 22 '22 at 09:03
Have you used your C# and Swift performance profiling tools to analyze your code and consequently identified where Swift is spending its time compared to the C# version? If not, **why not?** — Dai, Mar 22 '22 at 09:04
Are you certain your use of C#'s postfix `++` is behaving identically to Swift's `+= 1` operator? (I know C# doesn't suffer from C/C++'s `++` operator's [UB](https://stackoverflow.com/questions/949433/why-are-these-constructs-using-pre-and-post-increment-undefined-behavior), but it can still trip you up if you're not careful (arguably most people should be using the prefix operator instead), so I'm sympathetic to Apple's decision to just remove `++` from Swift entirely... — Dai, Mar 22 '22 at 09:08
You should never test performance in a Swift playground, since playgrounds are unoptimised. Also, did you test both implementations with optimised builds? — Dávid Pásztor, Mar 22 '22 at 09:45
Did you use debug or release builds? And forget about testing performance in a reliable way in an Xcode playground. — Joakim Danielson, Mar 22 '22 at 09:45
@Dai Thank you very much for your advice. I checked prefix and postfix twice, maybe it doesn't have a problem. And I fixed my code about OutputBuffer. And then, I don't have much experience debugging with analysis tools from both environments. I gonna find a way... — wonki, Mar 22 '22 at 14:02
@DávidPásztor Thank you guys, Otherwise, I suspected playground, so I tried iOS build. But the result was the same... So I should find a way to test performance. — wonki, Mar 22 '22 at 14:06
@Joakim Danielson I'd like to say the same thing. I can just tag one person per one comment. — wonki, Mar 22 '22 at 14:07
@wonki Why an iOS build? That's not going to help either - you need to use the same platform (OS and hardware) both both .NET and Swift. Fortunately you can build and run (headless) C# and Swift programs on macOS, Linux, and Windows now. — Dai, Mar 22 '22 at 14:18
I tried your algorithm in Playground, and it took 2.5 sec for 50 MB input and output buffers on my Mac Pro... So I wonder how are you measuring? Also your function will crash with index out of bounds if it enters `if`... — timbre timbre, Mar 22 '22 at 17:23
@ytrewq My file was 200MB, And I just measured elapsed time of executing function. My mac is pro 15 in 2019 has intel i9. — wonki, Mar 23 '22 at 00:46
@Dai Oh, In my case, I'm dealing medical dicom data, so file was processed in server or windows desktop program. And then downloaded data from server to iPad, data is shown in iPad. So I mean processing, compressing and uploading data from server or windows program and downloading and decompressing data from iPad application. As you said platform difference may cause performance difference. So I also gonna check them. Thank you. — wonki, Mar 23 '22 at 01:06
There are plenty of high-quality DICOM libraries around, and even more general-purpose LZ-family compression/decompression libraries. I'm not surprised at the performance differences you're seeing, what I am surprised at is that you're reinventing the wheel for no good reason: for example, both your C# and Swift programs are byte-by-byte implementations of LZF, which is just silly today because modern CPUs have SIMD and vector operation support built-in which your C# and Swift programs don't seem to be using - but if they did, you'd probably see an order-of-magnitude boost in perf. — Dai, Mar 23 '22 at 01:17
@Dai Yes, you are right. it's silly to implement it myself without cpu and simd knowledge. I've changed compression method to lz4 (supported from apple) now. Performance is greater than before (cost time: below 1 second)! But I should find package compatible in dotnet. — wonki, Mar 23 '22 at 03:26
Another thing to note: all of Swift’s arithmetic operators do overflow checks at runtime. This adds safety (no silent overflows will happen without you noticing), but at a runtime performance cost. This is usually minor, but can add up in hot loops like this. As always, profiling is the answer. If you find that the arithmetic bounds checks are a hotspot, you can use the unchecked operators like `&+`, `&-`, etc. — Alexander, Apr 14 '22 at 18:58

score 0 · Answer 1 · answered Mar 24 '22 at 02:10

It was not efficient code unconsidered SIMD and CPUs. So I used decompress method (lz4, zlib) is provided by apple. It's so faster than before. It costs below 1-second decompressing 200Mb file.

But In a managed environment (c#) it's slower than unmanaged. If you want more performance, implement native.

I use these lzib managed codes.

https://github.com/jstedfast/Ionic.Zlib
https://github.com/Kulestar/unity-zlib (unity version, dotnet-mono)

It costs 6-7 seconds for decompressing and 30 seconds for compressing at the same file.

And then you should know this code to be compatible to lzip in apple. It includes adding the header for compressed data.

    public static byte[] Compress(byte[] inputData)
    {
        var zlib = new ZlibCodec(CompressionMode.Compress);
        zlib.CompressLevel = Zlib.CompressionLevel.AppleSupported; // Level5
        zlib.InputBuffer = inputData;
        zlib.OutputBuffer = new byte[inputData.Length];
        zlib.NextIn = 0;
        zlib.AvailableBytesIn = inputData.Length;
        zlib.NextOut = 0;
        zlib.AvailableBytesOut = inputData.Length;
        zlib.InitializeDeflate(Zlib.CompressionLevel.AppleSupported, false); 
        // 'false' means it's 1951(deflate) version not 1950(lzib) version
        zlib.Deflate(FlushType.Finish);
        var output = new byte[zlib.TotalBytesOut];
        Array.Copy(zlib.OutputBuffer, output, (int)zlib.TotalBytesOut);
        return output;
    }

    public static byte[] Decompress(byte[] inputData, int outputSize)
    {
        var zlib = new ZlibCodec(CompressionMode.Decompress);
        zlib.CompressLevel = Zlib.CompressionLevel.AppleSupported;
        zlib.InputBuffer = inputData;
        zlib.OutputBuffer = new byte[outputSize];
        zlib.NextIn = 0;
        zlib.AvailableBytesIn = inputData.Length;
        zlib.NextOut = 0;
        zlib.AvailableBytesOut = outputSize;
        zlib.InitializeInflate(false);
        zlib.Inflate(FlushType.Finish);
        var output = new byte[zlib.TotalBytesOut];
        Array.Copy(zlib.OutputBuffer, output, (int)zlib.TotalBytesOut);
        return output;
    }

I wish to help the same person like me who implement multi-platform compressing.

You don’t need the `Array.Copy` calls or duplicate buffers at all. Also, consider using `Span` / `ReadOnlySpan` instead of `Byte[]`. — Dai, Mar 24 '22 at 02:15
@Dai Thank you, you're telling me in detail. I try to use Span and will modify code as you said. — wonki, Mar 24 '22 at 02:24
@Dai Unfortunately Span or ReadOnlySpan can't be used for async functions or lambda... — wonki, Mar 24 '22 at 02:37
I don't see how that's relevant in this case: none of your posted code is `async` nor does it use lambda-functions. — Dai, Mar 24 '22 at 03:09
@Dai Oh, sorry. I agree with you. In the common case, It looks good to use `Span` better than array copy. — wonki, Mar 24 '22 at 04:55

Compression Performance Difference between swift and Managed(c#)

1 Answers1