1

so I just found a really weird issue in my app and it turns out it was caused by the .NET Native compiler for some reason.

I have a method that compares the content of two files, and it works fine. With two 400KBs files, it takes like 0.4 seconds to run on my Lumia 930 in Debug mode. But, when in Release mode, it takes up to 17 seconds for no apparent reason. Here's the code:

// Compares the content of the two streams
private static async Task<bool> ContentEquals(ulong size, [NotNull] Stream fileStream, [NotNull] Stream testStream)
{
    // Initialization
    const int bytes = 8;
    int iterations = (int)Math.Ceiling((double)size / bytes);
    byte[] one = new byte[bytes];
    byte[] two = new byte[bytes];

    // Read all the bytes and compare them 8 at a time
    for (int i = 0; i < iterations; i++)
    {
        await fileStream.ReadAsync(one, 0, bytes);
        await testStream.ReadAsync(two, 0, bytes);
        if (BitConverter.ToUInt64(one, 0) != BitConverter.ToUInt64(two, 0)) return false;
    }
    return true;
}

/// <summary>
/// Checks if the content of two files is the same
/// </summary>
/// <param name="file">The source file</param>
/// <param name="test">The file to test</param>
public static async Task<bool> ContentEquals([NotNull] this StorageFile file, [NotNull] StorageFile test)
{
    // If the two files have a different size, just stop here
    ulong size = await file.GetFileSizeAsync();
    if (size != await test.GetFileSizeAsync()) return false;

    // Open the two files to read them
    try
    {
        // Direct streams
        using (Stream fileStream = await file.OpenStreamForReadAsync())
        using (Stream testStream = await test.OpenStreamForReadAsync())
        {
            return await ContentEquals(size, fileStream, testStream);
        }
    }
    catch (UnauthorizedAccessException)
    {
        // Copy streams
        StorageFile fileCopy = await file.CreateCopyAsync(ApplicationData.Current.TemporaryFolder);
        StorageFile testCopy = await file.CreateCopyAsync(ApplicationData.Current.TemporaryFolder);
        using (Stream fileStream = await fileCopy.OpenStreamForReadAsync())
        using (Stream testStream = await testCopy.OpenStreamForReadAsync())
        {
            // Compare the files
            bool result = await ContentEquals(size, fileStream, testStream);

            // Delete the temp files at the end of the operation
            Task.Run(() =>
            {
                fileCopy.DeleteAsync(StorageDeleteOption.PermanentDelete).Forget();
                testCopy.DeleteAsync(StorageDeleteOption.PermanentDelete).Forget();
            }).Forget();
            return result;
        }
    }
}

Now, I have absolutely no idea why this same exact method goes from 0.4 seconds all the way up to more than 15 seconds when compile with the .NET Native toolchain.

I fixed this issue using a single ReadAsync call to read the entire files, then I generated two MD5 hashes from the results and compared the two. This approach worked in around 0.4 seconds on my Lumia 930 even in Release mode.

Still, I'm curious about this issue and I'd like to know why it was happening.

Thank you in advance for your help!

EDIT: so I've tweaked my method in order to reduce the number of actual IO operations, this is the result and it looks like it's working fine so far.

private static async Task<bool> ContentEquals(ulong size, [NotNull] Stream fileStream, [NotNull] Stream testStream)
{
    // Initialization
    const int bytes = 102400;
    int iterations = (int)Math.Ceiling((double)size / bytes);
    byte[] first = new byte[bytes], second = new byte[bytes];

    // Read all the bytes and compare them 8 at a time
    for (int i = 0; i < iterations; i++)
    {
        // Read the next data chunk
        int[] counts = await Task.WhenAll(fileStream.ReadAsync(first, 0, bytes), testStream.ReadAsync(second, 0, bytes));
        if (counts[0] != counts[1]) return false;
        int target = counts[0];

        // Compare the first bytes 8 at a time
        int j;
        for (j = 0; j < target; j += 8)
        {
            if (BitConverter.ToUInt64(first, j) != BitConverter.ToUInt64(second, j)) return false;
        }

        // Compare the bytes in the last chunk if necessary
        while (j < target)
        {
            if (first[j] != second[j]) return false;
            j++;
        }
    }
    return true;
}
Sergio0694
  • 4,447
  • 3
  • 31
  • 58
  • 1
    I don't know if you are having the same issue, but try the code in [this answer](http://stackoverflow.com/a/34734203/1822514) and see if it makes a difference. – chue x Sep 01 '16 at 16:47

1 Answers1

2

Reading eight bytes at a time from an I/O device is a performance disaster. That's why we are using buffered reading (and writing) in the first place. It takes time for an I/O request to be submitted, processed, executed and finally returned.

OpenStreamForReadAsync appears to not be using a buffered stream. So your 8-byte requests are actually requesting 8 bytes at a time. Even with the solid-state drive, this is very slow.

You don't need to read the whole file at once, though. The usual approach is to find a reasonable buffer size to pre-read; something like reading 1 kiB at a time should fix your whole issue without requiring you to load the whole file in memory at once. You can use BufferedStream between the file and your reading to handle this for you. And if you're feeling adventurous, you could issue the next read request before the CPU processing is done - though it's very likely that this isn't going to help your performance much, given how much of the work is just I/O.

It also seems that .NET native has a lot bigger overhead than managed .NET for asynchronous I/O in the first place, which would make those tiny asynchronous calls all the more of a problem. Fewer requests of larger data will help.

Luaan
  • 62,244
  • 7
  • 97
  • 116
  • Thank you for your explanation! I tried using the AsStreamForRead method to get a buffered stream and I tried with different buffer sizes (1KB, 2KBs, 4KBs), but it always takes around 10-15 seconds to run, and I'm working with 500KBs files here. How's that possible, shouldn't the buffered stream solve this? – Sergio0694 Sep 01 '16 at 13:50
  • 1
    @Sergio0694 That's probably the part where .NET native has poor performance on asynchronous calls. Not a big deal on a real I/O operation, but it adds up when buffering is used. You'll probably have to use a `byte[]` buffer of your own and iterate over it, only using `ReadAsync` to fetch another batch. Fortunately, this is very easy to do, since `BitConverter` accepts an offset into the `byte[]` argument :) – Luaan Sep 01 '16 at 15:30
  • 1
    I've edited my question with the updated method, looks like that fixed it, let me know what you think! – Sergio0694 Sep 01 '16 at 17:58
  • @Sergio0694 Looks fine to me. Though I'd expect you don't need the buffer quite that big :P – Luaan Sep 01 '16 at 18:23