0

I need a high performance array in C# for a modelling project. This array frequently needs to be resized (preserving contents). It is frequently accessed by index and my feeling is that a List wil not perform as well as a traditional array[] when doing billions of lookups.

Originally in VB.NET I managed this myself, by keeping a larger array & a manual count:

private _populations() as Population
private _populationCount as Integer

' When adding a population... 
_populationCount += 1  ' Update true count of populations, not the arraysize
if _populationCount > _populations.Length then
    'if adding a population & array has run out of space, increase by a buffer of 20%:
    Redim Preserve _populations(_populationCount * 1.2)
end if

I can easily reimplement this in C#, but I'm wondering if there is a tool already in the .NET framework to do this? Performance is critical so I cannot accept any performance hit for the sake of elegance or best practice.

P.S. I am not concerned about inserting or removing as I reuse elements manually.

Erik Philips
  • 53,428
  • 11
  • 128
  • 150
Brendan Hill
  • 3,406
  • 4
  • 32
  • 61
  • Duplicate: http://stackoverflow.com/questions/454916/performance-of-arrays-vs-lists – Brannon Feb 21 '14 at 22:23
  • 1
    "my feeling is that a List wil not perform as well as a traditional array[] when doing billions of lookups." - why do you feel that way? – Ed S. Feb 21 '14 at 22:45
  • Well, if indexing into this data structure is really important to you you can go faster than a `List`. It has additional overheads like range checking and iterator invalidation for writes. The JIT does not get rid of this. – usr Feb 21 '14 at 22:54
  • Ed - maybe I"m wrong, just seems a List is heavier in it's implementation than an array. For example looking up by index is a function call rather than direct memory lookup. – Brendan Hill Feb 22 '14 at 02:13
  • @EdS.: Because `List` has an extra level of indirection? – user541686 Feb 22 '14 at 04:31
  • @Mehrdad: But of course, the JIT'er could inline that indirection, making the function call a no-op. Arrays perform bounds checking, so you're probably not saving much if anything. My point was that it's silly to make such an assumption. – Ed S. Feb 22 '14 at 06:31
  • @EdS.: No, by "indirection" I wasn't referring to the function call (which is in fact a *direct* call, *not* an "indirect" call). I was referring to the extra memory load required to retrieve the array from inside the `List` object, due to the fact that `List` is a reference type. – user541686 Feb 22 '14 at 06:35
  • @Mehrdad: Sure, ok, but there are no specifics as to performance requirements and no profiling of any kind has taken place. I see no reason based upon the OP's description to dump `List` at this point. – Ed S. Feb 22 '14 at 06:37
  • 1
    @EdS.: How would it have changed anything if the OP had said *"I have profiled my code and have verified `List` is indeed the bottleneck"*? The rest of the question would have been **exactly** the same, nothing would have changed here and you wouldn't have gained any additional information. So why not just assume the OP has already done his job before asking the question, and actually answer the question with that assumption? – user541686 Feb 22 '14 at 06:40
  • @Mehrdad: I'm not saying that the question should be closed or anything, I was just prodding the OP to try and understand why he believes this to be a bottleneck. This could be a wild goose chase for all we know, and if we had actual data perhaps he could get a better answer. As an aside, I have written an image processing library in C# (not all that fun I must say...) I tested List v arrays and found the difference in regards to lookups to be pretty tiny (iirc it was like ~100ms over a 6mp image.) I did go with arrays because it made more sense, but I understood the problem. – Ed S. Feb 22 '14 at 06:43
  • 1
    @Mehrdad: I'm with you in that I don't like how optimization questions seem to get poo poo'd around here, that's why I left a comment and not an answer. – Ed S. Feb 22 '14 at 06:45
  • @EdS.: Never mind, the upvote didn't correlate with the close vote here. I feel like in general it does though, that's why I don't like these comments -- usually whenever there's a comment like that people end up voting to close the question because of it (even when the original writer didn't intend to). :( – user541686 Feb 22 '14 at 06:49
  • @Mehrdad: Yeah, I can see that happening. Perhaps I should have been a bit more specific myself. – Ed S. Feb 22 '14 at 06:52

3 Answers3

1

Thanks for the answers. A plain old array is performing better than List<>. I believe this is due to the fact that adding & retrieving with a List<> involves function calls (eg. even mylist[i] is a function call to the index property getter), while with an array they are direct memory operations. Resizing performs around the same.

The performance dealing with a dataset of 40,000,000 was as follows:

-                           Array     List
Instantiating:              0         0
Iterating/Populating:       324ms     471ms
Iterating/Retrieving:       215ms     380ms
Resizing:                   160ms     160ms

Therefore, while the performance differences are negligible for almost all practical applications, given the performance requirements I have I will be using plain old arrays.

Code:

class Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Press A for array; L for list. These are run separately to avoid memory management getting in the way of a clean test.");
        var key = Console.ReadKey(true);
        if (key.Key == ConsoleKey.L)
        {
            for (var i = 0; i < 5; i++)
            {
                Console.WriteLine("List run " + i);
                TestList();
                Console.WriteLine();
            }
        }
        else if (key.Key == ConsoleKey.A)
        {
            for (var i = 0; i < 5; i++)
            {
                Console.WriteLine("Array run " + i);
                TestArray();
                Console.WriteLine();
            }
        }

        Console.WriteLine("Done.");
        Console.ReadKey();
    }

    private const int DataSize = 40000000; // The limit of my old laptop :(

    private static int[] GetBaseData()
    {
        return new int[DataSize];
    }

    private static void TestList()
    {
        int[] baseData;

        using (time("creating base data"))
        {
            baseData = GetBaseData();
        }

        List<int> testData;
        using (time("Initialization"))
        {
            testData = new List<int>(DataSize);
        }

        using (time("Populating using FOR (not FOREACH)"))
        {
            var c = baseData.Count();
            for (var i = 0; i < c; i++)
            {
                testData.Add(baseData[i]);
            }
        }

        using (time("Iterating & retrieving with FOR (not FOREACH)"))
        {
            var c = testData.Count();
            var v = 0;
            var oi = 0;
            for (var i = 0; i < c; i++)
            {
                oi = testData[i];
                v += oi;
            }
        }

        using (time("Resizing"))
        {
            testData.Add(1); // Enough to push it over the original limit
        }
    }

    private static void TestArray()
    {
        int[] baseData;

        using (time("creating base data"))
        {
            baseData = GetBaseData();
        }

        int[] o;
        using (time("Initialization"))
        {
            o = new int[DataSize]; // SomeClass[DataSize];
        }

        using (time("Populating with FOR"))
        {
            var c = baseData.Count();
            for (var i = 0; i < c; i++)
            {
                o[i] = baseData[i];
            }
        }

        using (time("Iterating FOR"))
        {
            var v = 0;
            var c = o.Count();
            var oi = 0;
            for (var i = 0; i < c; i++)
            {
                oi = o[i];
                v += oi;
            }
        }

        using (time("Resizing"))
        {
            Array.Resize(ref o, DataSize * 2); // NOTE: this doubles to match List behavior but further efficiencies could be gained with lower values eg. 1.2
        }
    }

    private static TimeAdviser time(string message)
    {
        return new TimeAdviser(DataSize + ": " + message, (x) => Console.WriteLine(x));
    }

    private class TimeAdviser : IDisposable
    {
        public DateTime Start;
        public string Message;
        public Action<string> OnComplete;

        public TimeAdviser(string message, Action<string> onComplete)
        {
            Start = DateTime.Now;
            Message = message;
            OnComplete = onComplete;
        }

        public void Dispose()
        {
            OnComplete(Message + " time taken: " + (DateTime.Now -Start).TotalMilliseconds);
        }
    }
}
Brendan Hill
  • 3,406
  • 4
  • 32
  • 61
  • It's probably not the function calls, but the extra level of indirection. – user541686 Feb 22 '14 at 04:33
  • What do you mean? If a function isn't inline, it will hit the callstack, etc etc, which is quite expensive (for these purposes at least). What other indirection level is there? – Brendan Hill Feb 22 '14 at 23:03
0

Start with using List<T>. Then use a profiler to see where the bottelenecks are in your application and if it turns out the List<T> is actually a bottleneck you can optimize theoretically by duplicating the code for what List<T> does and likely removing some of the checks that you won't need on every call since you can reliably ensure indexes are valid.

You can also optimize List<T> a bit by specifying an initial capacity, something larger than what you know you'll use. This will reduce the need for the backing array to get expanded/copied.

Samuel Neff
  • 73,278
  • 17
  • 138
  • 182
0

I think using Array.Resize<T> is all that is built into C# to resize arrays, and would be how to reimplement your VB solution. But it involves a copy of elements from one array to the new one.

If you need high performance then I'd suggest making a class to manage a linked list of arrays. According to this answer this concept is how StringBuilder works efficiently.

But measure before you expend the labor. You might not need it.

Community
  • 1
  • 1
jltrem
  • 12,124
  • 4
  • 40
  • 50