8

Does the iteration speed through an array (or list,linkedList,Dictionary ect) depend on the data type?

Example: An array of 10 bools v/s an array of 10 integers?

svick
  • 236,525
  • 50
  • 385
  • 514
orukinawa
  • 99
  • 5
  • 4
    Try it and you'll find out. – Eric Lippert May 11 '13 at 06:03
  • 1
    @EricLippert at least you should tell him how to try? You think he is a senior programmer (or at least with 3-4 years of experience in professional programming), don't you? – King King May 11 '13 at 06:13
  • 8
    @KingKing: If the OP is unable to write a program that iterates through two arrays then what does it matter how fast the program they cannot write is? They can't write it. (That said, it can be difficult to write a correct benchmark; this will be the subject of a series of articles by me coming up soon; watch my blog for details.) – Eric Lippert May 11 '13 at 06:16
  • 2
    My standard rant on why you should not ask this question on StackOverflow is here: http://ericlippert.com/2012/12/17/performance-rant/ – Eric Lippert May 11 '13 at 06:16
  • I just thought to test any thing related to performance speed, a newbie would feel uneasy because the main problem is to implement a benchmark program, glad to watch your coming soon series of articles, is your blog ericlippert.com ? – King King May 11 '13 at 06:19
  • 1
    @KingKing: Yes. Now, that said, the key lesson here is that *performance differences are only interesting if the user can notice the difference between good and bad performance*. So, write a program both ways, run it both ways -- could you tell the difference? If not, then it doesn't matter which one is faster! It doesn't matter which invisible unicorn is prettier. ] – Eric Lippert May 11 '13 at 06:30
  • @EricLippert That's of course correct, I usually use Selection Sort for small arrays sorting (of course in another environment, not in.NET where I can use built-in sorting method), don't care about other higher efficient sorting algorithms like quick sort or merge sort. However when the number of calculations is large, the difference between good and bad methods will be visible and noticeable. The OP may want to mean the performance comparison in such a case. – King King May 11 '13 at 06:45
  • Like you said, I could write a simple program to test it but I wouldn't know how to test the results efficiently (best I can think of is iterating through huge arrays with a stop watch in my hand) I'm also aware that those differences cannot really be noticed so why bother you ask me? Well I'm just curious that's all... – orukinawa May 11 '13 at 06:47
  • 1
    @user2320115 use a programmatic [stopwatch](http://msdn.microsoft.com/en-us/library/system.diagnostics.stopwatch.aspx) – James May 11 '13 at 07:16
  • In my opinion, this question is too big to be covered properly and completely. – Chibueze Opata May 11 '13 at 07:30

3 Answers3

2

Yes, the datatype matters. It has nothing to do with the iteration; it has everything to do with the datatypes.

Value types

An int is 4 bytes in length. A decimal is 16 bytes in length. So a decimal is 4 times bigger than an int. Every time you retrieve a value from the array the that value is copied. In case of a decimal al 16 bytes are copied. (In case of a reference type the reference is copied, normally 4 or 8 bytes). Copying more bytes will simply slow down the iteration.

Boxing

If you iterate trough a collection, there is also the possibility that you have change type. For example:

foreach(object o in new int[] { 1,2,3 })
     ....

This will box every int to an object. This takes time. That has nothing to do with the iteration, it has everything to do with the fact that you are boxing.

Casting

Last example: There are also arrays in where you have to cast:

foreach(Person p in new object[] { ... })
     ....

Casting also takes extra time.

EDIT

Some time measurements to backup my claim:

Times in milliseconds. Arrays are of size 10,000. Iterations also 10,000.

Martin Mulder
  • 12,642
  • 3
  • 25
  • 54
  • Actually, when you have a `List`, the data copying doesn't matter as the data is presented *as-is* to the iterator. Unboxing is avoided and there is no real loss with boxing/unboxing. This is why ArrayList should be avoided since they behave in this manner. See [MSDN Remark on this](http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx). – Chibueze Opata May 12 '13 at 10:46
  • @Chibueze: Where in my text am I saying that a decimal is boxing or unboxing? – Martin Mulder May 12 '13 at 10:47
  • +1 for the screenshots. Your key point is that data is copied during iteration and that copying of that data will create a time variation. Also, you are forcing boxing and unboxing which is not what the question is about. The question is asking specifically about iterating through arrays or collections. That is not how collections such as List is implemented internally. – Chibueze Opata May 12 '13 at 11:06
  • I am not talking about how the collection work internally. I was talking about HOW you iterate through a collection. And yes, I am forcing this situation in my examples, because I see these kind of iterations many times (with boxing, unboxing, casting, etc.) – Martin Mulder May 12 '13 at 11:22
  • Actually forcing boxing and unboxing is unnecessary and should be advised against in a real life scenario. – Chibueze Opata May 12 '13 at 11:42
  • @Chibueze: True, but they cannot always be avoided. For example, using a collection of basetypes (ie `Person`, which contain different instances of different inherited types (ie `Employer` and `Employee`). Or a list with an interface (ie `IComparable`) and different implementations of it (`int` and `decimal`). – Martin Mulder May 12 '13 at 11:53
1

Run the code below if you want , but here's a quick comparison. All that it does is iterate over the array/list, and set a temp variable to the value in that index.

3rd run, high priority Threw some more types in

Do note that somehow the Int performance took a hit when running now ... no idea why ... but it happens on repeated runs as well ...

    namespace Iterating_types
    {
    using System;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.Linq;
    using System.Text;
class Program
    {
        static void Main(string[] args)
        {
            Thread.CurrentThread.Priority = ThreadPriority.Highest;
            Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.RealTime;

            Stopwatch watch = new Stopwatch();
            int UPPER = 1000000;
            int[] int_arr = Enumerable.Range(1, UPPER).ToArray();
            List<int> int_list = Enumerable.Range(1, UPPER).ToList();

            Int32[] int32_arr = Enumerable.Range(1, UPPER).ToArray();

            Int64[] int64_arr = new Int64[UPPER]; 

            IntObject[] intobject_arr = new IntObject[UPPER];
            List<IntObject> intobject_list = new List<IntObject>();

            string[] string_arr = new string[UPPER];
            List<string> string_list = new List<string>();

            bool[] bool_arr = new bool[UPPER];
            Boolean[] boolean_arr = new Boolean[UPPER];
            List<bool> bool_list = new List<bool>();
            List<Boolean> boolean_list = new List<Boolean>();
            // Initializing some of the above
            for (int i = 0; i < UPPER; i++)
            {
                int64_arr[i] = (Int64) i;
                string_arr[i] = i.ToString();
                string_list.Add(i.ToString());
                intobject_arr[i] = new IntObject(i);
                intobject_list.Add(new IntObject(i));
                bool_arr[i] = (i%2 ==0);
                boolean_arr[i] = (i%2 ==0);
                bool_arr[i] = (i%2 ==0);
                bool_list.Add(i%2 ==0);

                boolean_list.Add(i%2 == 0);
            }

            Console.WriteLine("Iterations: {0}{1}", UPPER, Environment.NewLine);
            Console.WriteLine("Thread priority: {0}", Thread.CurrentThread.Priority);
            Console.WriteLine("Process priority: {0}", Process.GetCurrentProcess().PriorityClass);

            Console.WriteLine("\n\rArrays:\t----------------------------------------------");

            bool b;
            b = bool_arr[1];
            watch.Start();
            for (int i = 0; i < UPPER; i++)
            {
                b = bool_arr[i];
            }
            watch.Stop();
            Console.WriteLine("Type: bool\tStructure: Array\tticks: {0}\tMiliSeconds:{1}", watch.ElapsedTicks, watch.ElapsedMilliseconds);

            watch.Start();
            for (int i = 0; i < UPPER; i++)
            {
                b = boolean_arr[i];
            }
            watch.Stop();
            Console.WriteLine("Type: Boolean\tStructure: Array\tticks: {0}\tMiliSeconds:{1}", watch.ElapsedTicks, watch.ElapsedMilliseconds);

            int temp_int;
            temp_int = int_arr[1];
            watch.Start();
            for (int i = 0; i < UPPER; i++)
            {
                temp_int = int_arr[i];
            }
            watch.Stop();
            Console.WriteLine("Type: Int\tStructure: Array\tticks: {0}\tMiliSeconds:{1}", watch.ElapsedTicks, watch.ElapsedMilliseconds);

            Int32 temp_int32 ;
            temp_int32 = int32_arr[1];
            watch.Reset();
            watch.Start();
            for (int i = 0; i < UPPER; i++)
            {
                temp_int32 = int32_arr[i];
            }
            watch.Stop();
            Console.WriteLine("Type: Int32\tStructure: Array\tticks: {0}\tMiliSeconds:{1}", watch.ElapsedTicks, watch.ElapsedMilliseconds);

            Int64 temp_int64 ;
            temp_int64 = int64_arr[1];
            watch.Reset();
            watch.Start();
            for (int i = 0; i < UPPER; i++)
            {
                temp_int64 = int64_arr[i];
            }
            watch.Stop();
            Console.WriteLine("Type: Int64\tStructure: Array\tticks: {0}\tMiliSeconds:{1}", watch.ElapsedTicks, watch.ElapsedMilliseconds);

            string s ;
            s = string_arr[1];
            watch.Reset();
            watch.Start();
            for (int i = 0; i < UPPER; i++)
            {
                s = string_arr[i];
            }
            watch.Stop();
            Console.WriteLine("Type: string\tStructure: Array\tticks: {0}\tMiliSeconds:{1}", watch.ElapsedTicks, watch.ElapsedMilliseconds);

            temp_int = intobject_arr[1].IntValue;
            watch.Reset();
            watch.Start();
            for (int i = 0; i < UPPER; i++)
            {
                temp_int = intobject_arr[i].IntValue;
            }
            watch.Stop();
            Console.WriteLine("Type: IntObject\tStructure: Array\tticks: {0}\tMiliSeconds:{1}", watch.ElapsedTicks, watch.ElapsedMilliseconds);

            Console.WriteLine("\n\rLists:\t----------------------------------------------");

            watch.Reset();
            watch.Start();
            foreach (var val in bool_list)
            {
                b = val;
            }
            watch.Stop();
            Console.WriteLine("Type: bool\tStructure: List\t\tticks: {0}\tMiliSeconds:{1}", watch.ElapsedTicks, watch.ElapsedMilliseconds);

            watch.Reset();
            watch.Start();
            foreach (var val in boolean_list)
            {
                b = val;
            }
            watch.Stop();
            Console.WriteLine("Type: Boolean\tStructure: List\t\tticks: {0}\tMiliSeconds:{1}", watch.ElapsedTicks, watch.ElapsedMilliseconds);

            temp_int = int_list.First();
            watch.Reset();
            watch.Start();
            foreach (var val in int_list)
            {
                temp_int = val;
            }
            watch.Stop();
            Console.WriteLine("Type: Int\tStructure: List\t\tticks: {0}\tMiliSeconds:{1}", watch.ElapsedTicks, watch.ElapsedMilliseconds);

            s = string_list.First();
            watch.Reset();
            watch.Start();
            foreach (var val in string_list)
            {
                s = val;
            }
            watch.Stop();
            Console.WriteLine("Type: string\tStructure: List\t\tticks: {0}\tMiliSeconds:{1}", watch.ElapsedTicks, watch.ElapsedMilliseconds);

            temp_int = intobject_list.First().IntValue;
            watch.Reset();
            watch.Start();
            foreach (var val in intobject_list)
            {
                temp_int = val.IntValue;
            }
            watch.Stop();
            Console.WriteLine("Type: IntObject\tStructure: List\t\tticks: {0}\tMiliSeconds:{1}", watch.ElapsedTicks, watch.ElapsedMilliseconds);

            Console.WriteLine();
            Console.WriteLine("Hit any key to exit.");
            Console.ReadKey();


        }
    }

    class IntObject
    {
        public int IntValue { get; set; }

        public IntObject ()
        {
            IntValue = 0;
        }

        public IntObject(int i)
        {
            IntValue = i;
        }
    }
}
Noctis
  • 11,507
  • 3
  • 43
  • 82
  • measuring performance is art. It looks that Int64 takes more time than an Int32, but as much time as an int. This is not correct. An int is an alias for Int32 and should perform equally fast. – Martin Mulder May 11 '13 at 11:16
  • Well, it seems it does :) I've updated the code and screenshot, numbers are about the same. Note that a milliseconds has 10,000 ticks, but the numbers i see in the output don't seem to be very precise. Even then, this is for a million iterations, so in the end, it doesn't really matter. – Noctis May 11 '13 at 14:15
  • The more iterations of the test, the more quircky things you rule out. A little advices for the next time: 1) always run a test first (one single iteration) without time measurement. During this iterations some assemblies could be loaded (which could have slown down the real test). 2) Set your process- and thread priority to the highest value, preventing contextswitchting as much as possible, hindering the testresults. – Martin Mulder May 12 '13 at 10:26
  • Or you could do some statistical treatment of the data instead of just rely on mean values or the result of large iterations, like calculating confidence intervals. – Caian May 12 '13 at 11:50
  • Martin: Followed suggestions, can't say the results differ :) Caian : This is why the small fish shouldn't swim with the sharks ... been a while since Uni, and we didn't do confidence intervals (or at least, i can't remember it :) – Noctis May 12 '13 at 14:03
-1

The simple answer is YES for reference types, and NO for value types.

This is because, implementation of .NET generics are done in such a way that it is boxing/unboxing is avoided when using Value Types though not in ArrayLists. For instance a List<int> would store the array integers directly as integers on the heap rather than as objects. In the case of reference types e.g. List<string>, List<person> however, there will be a little time loss in conversion/casting from object to data type.

See comparison between HashSet and List using strings and objects.

Deciding which one to use between List, LinkedList, Dictionary, HashSet, etc. when you are doing a large number of iterations is mainly a matter of understanding how they are stored, and their runtime complexities. Below is a list of the implementation and asymptotic indexing/iteration time complexities of some of the .NET Generics:

Internal Implementation / Asymptotic Time Complexities for .NET Generics


+------------------+---------------------------------------+-------------------------+-------------+
|                  |                                       |         Item[i]         |             |
| Name             | Internal Implementation               |------------+------------| Iteration   |
|                  |                                       | Avg. Case  | Worst Case |             |
+------------------+---------------------------------------+------------+------------+-------------+
| List             | Array                                 | O(1)       | O(1)       | O(n)        |
| LinkedList       | Doubly Linked List                    | O(n)       | O(n)       | O(n)        |
| Dictionary       | Hashtable with links to another array | O(1)       | O(n)       | O(n)        |
| HashSet          | Hashtable with links to another array | O(1)       | O(n)       | O(n)        |
| SortedDictionary | Red-black tree                        | O(log n)   | O(log n)   | O(n)        |
| SortedList       | Array                                 | O(1)       | O(n)       | O(n)        |
| SortedSet        | Red-black tree                        | O(n)       | O(n)       | O(n)        |
+------------------+---------------------------------------+------------+------------+-------------+

To summarize, one can determine the most likely speed of iterating through these data types based on their time complexities. As far as it concerns fast finding of items, List, SortedList, Dictionary, and HashSet will always out-perform others, however List and SortedList are not advisable to use if you are going to be handling a large number of items which then puts Dictionary and HashSet on the advantage for large lists (where performance matters most).

References:

  1. Runtime Complexity of .NET Generic Collection
  2. Comparative Analysis of List, HashSet and SortedSet
  3. Time complexity overview: Dictionary classes
  4. Generic Collections in C# - Time complexity overview
  5. Algorithms: Big-Oh Notation
  6. Sorted Dictionary(Tkey, Tvalue) - MSDN
  7. List(T) - MSDN

Glossary:

  1. Red-black tree
  2. Boxing and Unboxing
Community
  • 1
  • 1
Chibueze Opata
  • 9,856
  • 7
  • 42
  • 65
  • A beautyfull answer, but I do not think you answered the real question. First I think your first line should be: "NO for reference types, and YES for value types", since the question is, in short: "Does iteration speed through an array depend on the data type". He is not asking about differences in collection types. Furthermore you combine the `Item[i]` and `Find(i)` which are totally different operations. In an array `Item[i]` is an O(1) operation, but `Find(i)` is an O(n) operation. – Martin Mulder May 12 '13 at 10:32
  • No. Actually, it matters for reference types but not for value types and not the other way round. See [MSDN Remarks](http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx). As for the Item[i]/Item find title, maybe it's ambiguous but it's not intended to mean the same thing as List.Find( match) which you're referring to. – Chibueze Opata May 12 '13 at 10:37
  • Please response on the first half of my comment. – Martin Mulder May 12 '13 at 10:46
  • And your point is that the extra detail was unnecessary or what? – Chibueze Opata May 12 '13 at 10:51
  • Also, you did mention in your answer that casting takes time so why saying NO for reference types? – Chibueze Opata May 12 '13 at 11:00
  • Yes, casting takes time for reference types AND values types. My coffee was not working. It should be YES+YES, depending on the situation. Without casting and without (un)boxing, it would have been NO for reference types, YES for value types. – Martin Mulder May 12 '13 at 11:02