4

After asking my question on Code Review.stackexchange I got advised to use the following code pieces. And I noticed that during the transfer the string[] Lines gets set over to a IEnumerable.

After looking around the IEnumerable function for a bit I did not find anything that suggested any performance improvement. So Is this just for readability? Or is there actually a performance difference or general advantage using the IEnumerable instead of an array?

private void ProcessSingleItem(String fileName, String oldId, String newId)
{
    string[] lines = File.ReadAllLines(fileName);

    File.WriteAllText(fileName, ProcessLines(lines, oldId, newId));
}  

private String ProcessLines(IEnumerable<String> lines, String oldId, String newId)
{
    StringBuilder sb = new StringBuilder(2048);
    foreach (String line in lines)
    {
        sb.AppendLine(line.Replace(oldId, newId));
    }
    return sb.ToString();
}  
Community
  • 1
  • 1
MX D
  • 2,453
  • 4
  • 35
  • 47
  • it depends on how the `IEnumerable` is implemented in terms of performance. – Daniel A. White Oct 03 '14 at 11:44
  • @DanielA.White What exactly do you mean with 'implemented'? – MX D Oct 03 '14 at 11:45
  • it could be a `HashSet`, an array, a `List`, etc. Each would enumerate it depending on how its stored. – Daniel A. White Oct 03 '14 at 11:46
  • Possible duplicate of http://stackoverflow.com/questions/1124753/performance-difference-for-control-structures-for-and-foreach-in-c-sharp – mfarouk Oct 03 '14 at 11:47
  • 2
    If you write your code to take an `IEnumerable`, then you don't care if it is an `Array`, `List`, or any other object as long as it implements `IEnumerable`. This is great if you want to pass results of a `Where`. In other words, it is more about flexibility than performance in my mind. – crashmstr Oct 03 '14 at 11:48
  • 2
    Its not a performance thing, it is sensible that the method takes an IEnumerable parameter because the you can pass in an type that implements IEnumerable, param was `string[] lines` then you could only pass in an array of strings. – Ben Robinson Oct 03 '14 at 11:48
  • @mfarouk this is more to do with function signature than performance, so not a duplicate (at least of that question). – crashmstr Oct 03 '14 at 11:50
  • Simple. One takes advantage of object oriented programming architecture, while the other does not. – theMayer Oct 03 '14 at 11:51
  • 1
    Your question implies that there are only two things worth considering when making a choice: performance and readability. Though those are surely important, there are many other concerns. – Eric Lippert Oct 03 '14 at 12:38
  • @EricLippert Did not mean to imply it that way, that is also why I added general advantage to my question as statement. – MX D Oct 03 '14 at 12:43

5 Answers5

15

So far all the answers have said that being more general in what you accept makes your helper method more useful. This is correct. However, there are other considerations as well.

  • Pro: Taking a sequence rather than an array communicates to the developer who is calling your code "I am not going to mutate the object you pass me". When I call a method that takes an array, how do I know that it isn't going to change the array?

  • Con: taking a more general type means that your method must be correct for any instance of that more general type. How do you know it is correct? Testing. So taking a more general type can mean a larger test burden. If you take an array then you only have a few cases: null arrays, empty arrays, covariant arrays and so on. If you take a sequence the number of cases to test is larger.

  • You mentioned performance. Remember, making micro-decisions based on feelings rather than empirical data is a terrible way to get good performance. Instead, set a performance goal, measure your progress against that goal, and use a profiler to find the slowest thing; attack that first. A foreach loop on an array compiles down to the equivalent for loop; on an IEnumerable the code is more complicated and possibly several microseconds slower. Is the success or failure of your application in the marketplace going to be determined by those microseconds? If so, then carefully measure the performance. If not, write the code the way you like it, and if you've introduced a regression that causes you to no longer meet your goal, your automated tests will tell you about it. You are running automated performance tests, right? If you care so much about performance, you ought to be.

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • I run allot of performance tests yes, all tough they are a bit limited (it runs over the unity engine). Just trying to improve the performance eating parts right now. – MX D Oct 03 '14 at 12:58
  • @MXD: Sounds like you are on the road to good performance then! – Eric Lippert Oct 03 '14 at 13:08
  • Since you mentioned Unity, here's something to consider: both `IEnumerable` and `T[]` will allocate onto the managed heap when iterated using a `foreach` loop. It's often a better idea to either use a `for` loop with an array, or a `foreach` loop with a specific collection type, not because one is faster than the other, but because doing so reduces pressure on Unity's decrepit, ancient version of the Mono garbage collector. Collections are often a major source of performance issues when writing games with managed code. – Cole Campbell Oct 03 '14 at 15:47
  • 2
    @ColeCampbell: What causes the allocation in the case of an array? A compiler is permitted to implement `foreach` on an array as thought it were a `for` loop, and the Microsoft C# compiler does so. Does the compiler used by Unity not do so? – Eric Lippert Oct 03 '14 at 16:41
  • @EricLippert: I'm not sure; I had actually entirely forgotten that the Microsoft C# compiler does this. I have no reason to believe that Mono is any different. Thanks for pointing out the mistake! – Cole Campbell Oct 03 '14 at 17:41
12

An array is an implementation ofIEnumerable.

The only members used from the lines parameter in your method, are those defined in IEnumerable. So you can choose IEnumerable as a parameter type and be the least picky in what you accept, allowing any IEnumerable implementation to be supplied as parameter.

Consider this example:

ProcessLines(GetItems(), ...);

public IEnumerable<string> GetItems()
{
    yield return "ItemAlwaysGetsIncluded";

    if (!once_in_blue_moon)
    {
        yield break;
    }

    yield return "ItemIncludedOnceInABlueMoon";
}

It's hard to tell what the performance impact is, since IEnumerable could be just about anything.

C.Evenhuis
  • 25,996
  • 2
  • 58
  • 72
4

Have a look at the Law of Demeter. You want to have parameters as generic as possible so that they can be used in more situations.

Now you pass in any collection that implements IEnumerable and are not only limited to arrays.

In this case it is more a design issue than a performance issue. Returning values are a different case though as there are definitely some performance gains there.

In this case, you are just iterating so IEnumerable is all that you need

David Pilkington
  • 13,528
  • 3
  • 41
  • 73
  • Though your suggestion -- which is a good one -- is in harmony with the goals of the so-called "Law of Demeter", that is not how the so-called "law" is usually described. The "Law of Demeter" is usually meant to be the rule that if method M needs service C, takes object A, which provides service B, which provides service C, then in M saying "A.B.C" to get at C is wrong; M should either take a C, or A should be modified to provide service C directly, not through B. This version of the so-called "law" is in my opinion kinda crazy; your characterization makes sense. – Eric Lippert Oct 03 '14 at 12:36
3

IEnumerable provides only minimal "iterable" functionality. You can traverse the sequence, but that's about it. This has disadvantages -- for example, it is very inefficient to count elements using IEnumerable, or to get the nth element -- but it has advantages too -- for example, an IEnumerable could be an endless sequence, like the sequence of primes.

Array is a fixed-size collection with random access (i.e. you can index into it)

WAQ
  • 2,556
  • 6
  • 45
  • 86
1

In the code above it doesn't make any difference as you have materialized array, IEnumerable would really help if you use

System.IO.StreamReader file = new System.IO.StreamReader("c:\\test.txt");

while((line = file.ReadLine()) != null)
{
   yield return line;
}

file.Close();

here So in above code we will have only one line in memory all the time, helping read larger size of file with only few bytes of memory consumption.

Also general law is while passing input parameter to function type should be widely acceptable, so as IEnumerable is implemented by all the collections and collection based interfaces its good idea to have type of parameter as IEnumerable so that you can pass List or any other collection type.

Noctis
  • 11,507
  • 3
  • 43
  • 82
Rajnikant
  • 2,176
  • 24
  • 23