Need suggestions on optimizing this code

Question

Currently when I read a 15Mb file my application goes over a gig of memory. Notice that, at the end of the main code, I compare the data that was inserted in the database with the original array from the file. Any suggestions are welcome.

Main code:

TestEntities entities = new TestEntities();

        using (FileStream fileStream = new FileStream(fileName + ".exe", FileMode.Open, FileAccess.Read))
        {

            byte[] bytes = new byte[fileStream.Length];

            int numBytesToRead = (int) fileStream.Length;
            int numBytesRead = 0;

            while (numBytesToRead > 0)
            {
                int n = fileStream.Read(bytes, numBytesRead, numBytesToRead);

                if (n == 0)
                    break;

                numBytesRead += n;
                numBytesToRead -= n;
            }

            var query = bytes.Select((x, i) => new {Index = i, Value = x})
                .GroupBy(x => x.Index/100)
                .Select(x => x.Select(v => v.Value).ToList())
                .ToList();

            foreach (List<byte> list in query)
            {
                Binary binary = new Binary();
                binary.Name = fileName + ".exe";
                binary.Value = list.ToArray();
                entities.AddToBinaries(binary);
            }

            entities.SaveChanges();

            List<Binary> fileString = entities.Binaries.Where(b => b.Name == fileName + ".exe").ToList();

            Byte[] final = ExtractArray(fileString);
            if (Compare(bytes, final))
            {
                 /// Some notification that was ok
            }

        }

Compare Method:

public bool Compare(Byte[] array1,Byte[] array2)
    {
        bool isEqual = false;
        if (array1.Count() == array2.Count())
        {

            for (int i = 0; i < array1.Count(); i++)
            {
                isEqual = array1[i] == array2[i];
                if (!isEqual)
                {
                    break;

                }
            }
        }


        return isEqual;
    }

ExtractArray Method:

public Byte[] ExtractArray(List<Binary> binaries )
    {
        List<Byte> finalArray = new List<Byte>();

        foreach (Binary binary in binaries)
        {
            foreach (byte b in binary.Value)
            {
                finalArray.Add(b);
            }

        }

        return finalArray.ToArray();
    }

Comparing large binary objects can be done using the method described here: http://stackoverflow.com/questions/968935/c-sharp-binary-file-compare — David Brabant, Apr 19 '12 at 06:40
Is all you are doing a byte for byte comparison of two files? why do you do all this? — IanNorton, Apr 19 '12 at 06:44
why don't you store a digest of the files in the database and just digest the other ones, rather than comparing every time? — IanNorton, Apr 19 '12 at 06:45
I am segmenting the file and storing in the database, since the segmentation could be wrong, and I can't have that, I read it back from the database, and verify if it was written correctly. — Oakcool, Apr 24 '12 at 22:45

score 2 · Answer 1 · edited May 23 '17 at 12:29

For starters, I'd strongly recommend that you invest in a profiler. That's the right way to determine why your code is taking so long to run or is using a lot of memory. There are many profilers out there, including one built into Visual Studio 2010 if you have Premium or Ultimate.

See google or these post for others:

What Are Some Good .NET Profilers?

and

Best .NET memory and performance profiler?

Secondly, you probably shouldn't be assuming that your app should'nt go over a gig of memory. C# applications (actually, all .NET applications) are garbage collected. If I have a computer with sufficient RAM, there is no reason why the GC should run if there is no memory pressure, and if it doesn't the application can easily use up a gig of memory. That is particularly true for 64-bit environments, where processes are not subject to the memory limits of a 32-bit address space.

I do need to get a profiler, but in this case the profile will tell what I already know, since there isn't much code here, so crowdsorcing for better ideas is the best option. — Oakcool, Apr 24 '12 at 23:31
On your second point when the application reads a 15Mb file and it goes up to 6Gb in memory, something is really wrong, and thats what happening now. — Oakcool, Apr 24 '12 at 23:32

Likurg · Answer 2 · 2012-04-19T07:36:55.943

0

At first two variant of comapre:

bool arraysAreEqual = Enumerable.SequenceEqual(array1, array2);

or this one

    public bool Compare(Byte[] array1, Byte[] array2)
    {
        if (array1.Length != array2.Length)
            return false;

        for (int i = 0; i < array1.Length; i++)
        {
            if (array1[i] != array2[i])
                return false;
        }
        return true;            
    }

About extract try this:

foreach (Binary binary in binaries)
{
     finalArray.AddRange(binary.Value);
}

edited Apr 19 '12 at 07:36

answered Apr 19 '12 at 06:41

Likurg

2,742
17
22

Prefer Array.Length to IEnumerable.Count(). – Asik Apr 19 '12 at 06:49

Nicolas Repiquet · Answer 3 · 2012-04-25T06:35:54.863

1) Do you know the static method File.ReadAllBytes ? Could save you the first fifteen lines of code.

2) I hate Linq... Unreadable and it's so hard to understand what is really going on.

        var query = bytes.Select((x, i) => new {Index = i, Value = x})
            .GroupBy(x => x.Index/100)
            .Select(x => x.Select(v => v.Value).ToList())
            .ToList();

So for each byte of your file, you create an object containing the byte itself and its index. Wow. If your file is 15mb, that's 15 728 640 objects. Lets say this object takes 64 bytes, that's 960mb of memory space.

Btw, what are you trying to do ?

Edit

var bytes = File.ReadAllBytes(filename);

var chunkCount = (int)Math.Ceilling(bytes.Length / 100.0);

var chunks = new List<ArraySegment<byte>>(chunkCount);


for(int i = 0; i < chunkCount; i++) {
  chunks.Add(new ArraySegment(
      bytes,
      i * 100,
      Math.Min(100, bytes.Length - i * 100)
  ));
}

This should be several times faster.

Still, for better performances, you might insert chunks in database as you read the file, without keeping all those bytes in memory.

So the GroupBy part, tries to divide the file in a bunch of 100 bytes long chucks and then each chunk is inserted in the database. Later I can pull all the chunks and rebuild the file. — Oakcool, Apr 24 '12 at 23:37

Need suggestions on optimizing this code

3 Answers3