16

I need to iterate over a List<myObject> and remove items that answer a certain condition.

I saw this answer (https://stackoverflow.com/a/1582317/5077434):

Iterate your list in reverse with a for loop:

for (int i = safePendingList.Count - 1; i >= 0; i--)
{
    // some code
    // safePendingList.RemoveAt(i);
}

Example:

var list = new List<int>(Enumerable.Range(1, 10));
for (int i = list.Count - 1; i >= 0; i--)
{
    if (list[i] > 5)
      list.RemoveAt(i);
}
list.ForEach(i => Console.WriteLine(i));

But I understood that for is less efficient than foreach,

So I thought of using the later as follows:

foreach (var item in myList.ToList())
{
    // if certain condition applies:
    myList.Remove(item)
}

Is one method better than the other?

EDIT:

I don't want to use RemoveAll(...), as there is a large amount of code inside the loop, prior to the condition.

Community
  • 1
  • 1
Aviran Katz
  • 741
  • 2
  • 8
  • 26
  • 3
    "better" as in what? performance? memory? legibility? – Jcl Apr 18 '16 at 09:29
  • 3
    `for` is not less efficient than `foreach`. And instead of a loop, you can use `RemoveAll`, which has even better performance: https://msdn.microsoft.com/en-us/library/wdka673a%28v=vs.110%29.aspx – Dennis_E Apr 18 '16 at 09:30
  • @Jcl 1. Better performence 2. Better coding practice – Aviran Katz Apr 18 '16 at 09:31
  • 11
    You can't remove items from a the same list you're using in the foreach, you'll get an exception (System.InvalidOperationException). – MeTitus Apr 18 '16 at 09:32
  • For sequences that give you give you constant time index-based access to elements (e.g. lists, arrays), a standard `for` loop is guaranteed to be more efficient than a `foreach` simply because of the iterator protocol overhead of the latter… not that it matters much since the actual item removal (and the subsequent array reshifting) is the expensive part. – poke Apr 18 '16 at 09:33
  • @Marco You missed the ToList in his code. – Ralf Apr 18 '16 at 09:33
  • If there is a large amount of code in the loop then simply put it in a method and call it in the Predicate of RemoveAt. – Ralf Apr 18 '16 at 09:40
  • 1
    There is no such thing as `RemoveAll()` in LINQ. – Matthew Watson Apr 18 '16 at 09:45
  • 2
    @poke That is CERTAINLY not true for arrays, for which `foreach` is highly optimised. – Matthew Watson Apr 18 '16 at 09:47
  • As Jcl's comment hinted at, this question is primarily opinion-based (Is X better than Y?") and is off-topic for SO. – TylerH Apr 26 '16 at 19:45

5 Answers5

25

Willy-nilly you have to loop over the list, and the for loop is the most efficient one:

  for (int i = safePendingList.Count - 1; i >= 0; --i) 
    if (condition)
      safePendingList.RemoveAt(i);

If you want to remove in range (not in the entire list), just modify for loop:

  // No Enumarable.Range(1, 10) - put them into "for"
  for (int i = Math.Min(11, safePendingList.Count - 1); i >= 1; --i)
    if (condition)
      safePendingList.RemoveAt(i); 

Or if you have to remove items in forward looping:

  for (int i = 0; i < safePendingList.Count;) // notice ++i abscence
    if (condition)
      safePendingList.RemoveAt(i);
    else
      i += 1; // ++i should be here

On the contrary safePendingList.ToList() creates a copy of initial safePendingList and this means memory and CPU overhead:

  // safePendingList.ToList() - CPU and Memory overhead (copying)
  foreach (var item in safePendingList.ToList()) {
    if (condition)
      myList.Remove(item); // Overhead: searching
  }

However, the most reasonable plan in many cases is just to let .Net work for you:

  safePendingList.RemoveAll(item => condition);
Dmitry Bychenko
  • 180,369
  • 20
  • 160
  • 215
  • One thing to remember is that using `RemoveAt()` is much less efficient than `RemoveAll()` because it will keep shuffling up all elements at an index greater than the removed one. – Matthew Watson Apr 18 '16 at 10:13
  • 3
    `safePendingList.RemoveAll(item => condition);` is the closest to a higher-order `filter`y function – cat Apr 18 '16 at 13:04
  • 2
    As well as being short and readable, `RemoveAll` has the major benefit that it takes linear time, regardless of how many or which items need to be removed. Solutions based on `RemoveAt` may take quadratic time due to needing to shift large sections of the list repeatedly. – user2357112 Apr 18 '16 at 18:24
  • you sucked my answer into yours :-\ – fubo Nov 15 '20 at 10:33
9

to remove items with a certain condition you can use

list.RemoveAll(item => item > 5);

instead of

var list = new List<int>(Enumerable.Range(1, 10));
for (int i = list.Count - 1; i >= 0; i--)
{
    if (list[i] > 5)
        list.RemoveAt(i);
}
fubo
  • 44,811
  • 17
  • 103
  • 137
4

The problem is that foreach cannot work with collection while it is being modified this will lead to InvalidOperationException.

Using .ToList() inside foreach allows us to avoid that but this copies the entire collection it result in looping through collection twice (first time when it copies all elements and second time to filter it).

The for loop is better option because we can go through collection once. Apart of that you can filter collection using LINQ Where() method, something among the lines of :

var myFilteredCollection = myList.Where(i => i <= 5).ToList();

If your conditions are complex you can move this logic to separate method :

private bool IsItemOk(myObject item)
{
   ... your complicated logic.
   return true;
}

And use it in your LINQ query :

var myFilteredCollection = myList.Where(i => IsItemOk(i)).ToList();

Other option is to do it vice-versa. Create new collection and add elements based on condition. Than you can use foreach loop :

var newList = new List<MyObject>(myList.Count);

foreach (var item in myList)
{
    // if certain condition does not apply:
    newList.Add(item);
}

P.S.

But as already mentioned .RemoveAll() is better because this way you do not have to 're-invent the wheel'

Fabjan
  • 13,506
  • 4
  • 25
  • 52
4

Neither method shows significant differences. for and RemoveAll looks a bit faster (more likely because using Remove you are duplicating your list and it needs to recreate it... using value types, this also means doubling the needed memory). I made some profiling:

public static void MethodA()
{
    var list = new List<int>(Enumerable.Range(1, 10));
    for (int i = list.Count - 1; i >= 0; i--)
    {
        if (list[i] > 5)
            list.RemoveAt(i);
    }
}

public static void MethodB()
{
    var list = new List<int>(Enumerable.Range(1, 10));
    foreach (var item in list.ToList())
    {
        if (item > 5)
            list.Remove(item);
    }
}

public static void MethodC()
{
    var list = new List<int>(Enumerable.Range(1, 10));
    list.RemoveAll(x => x > 5);
}

public static void Measure(string meas, Action act)
{
    GC.Collect();
    GC.WaitForPendingFinalizers();
    var st = new Stopwatch();
    st.Start();
    for (var i = 0; i < 10000000; i++)
        act();
    st.Stop();
    Console.WriteLine($"{meas}: {st.Elapsed}");
}

static void Main(string[] args)
{
    Measure("A", MethodA);
    Measure("B", MethodB);
    Measure("C", MethodC);
}

The result:

A: 00:00:04.1179163

B: 00:00:07.7417853

C: 00:00:04.3525255

Other runs give even more similar results (note that this is for 10 million iterations on a 10 item list [thus doing 100 million operations]... a difference of ~3 seconds is not what I'd call "significant", but your mileage may vary)

So go with what you think you can read better (I'd personally use the RemoveAll method)

Note: as you grow your list (this is just 10 items), MethodB (the foreach one that duplicates the list) will be slower and the differences will be higher, since the duplication of the list will be more costly. For example, doing a list of 1,000 items (with the condition being on 500 instead of 5) and 100,000 iterations, makes MethodB be 34 seconds, while both others run under 2 seconds on my machine. So yes, definitely don't use MethodB :-)

This doesn't mean that foreach is less efficient than for, it only means that for using foreach to remove items from a list, you need to duplicate said list, and that is costly.

This other method (using foreach) should be about as fast as the for or RemoveAll methods (it's a tiny bit slower in my profiling -because it allocates a new list, I figure-, but pretty much negligible):

public static void MethodD()
{
    var originalList = new List<int>(Enumerable.Range(1, 1000));
    var list = new List<int>(originalList.Count);
    foreach (var item in originalList)
    {
        if (item <= 500)
            list.Add(item);
    }
    list.Capacity = list.Count;
}

You just need to reverse the condition.

I'm not suggesting this method, just proving that foreach is not necessarily slower than for

Jcl
  • 27,696
  • 5
  • 61
  • 92
  • 1
    Right, if you want to see which horse is faster… And often the best policy is to let .Net do its work - `RemoveAll`, +1 – Dmitry Bychenko Apr 18 '16 at 09:49
  • @DmitryBychenko I was expecting this result from theory, but yes, nothing like racing the horses :-) – Jcl Apr 18 '16 at 09:51
  • I had no doubt you *had known*, but since there's "...But I understood that `for` is less efficient than `foreach`..." in the question, the best way to ensure that quite the *contrary* is true is to let horses run. – Dmitry Bychenko Apr 18 '16 at 09:58
  • A 10 element list is far too small for this kind of test, because the overhead of memory allocation etc is going to distort the results. Try it with a list of a million elements (removing all items with an index less than 500,000) – Matthew Watson Apr 18 '16 at 10:16
  • @MatthewWatson yeah, i noted that in a update – Jcl Apr 18 '16 at 10:17
  • Why are you adjusting `Capacity` in `MethodD` and not in any of the others? – Johnbot Apr 18 '16 at 11:59
  • @Johnbot because I've set the capacity explicitly higher in that method (on the list constructor), and haven't in the others. It's definitely not needed but since I know it'll more likely be higher (in my case, twice the needed one), I'd rather adjust it after I'm done – Jcl Apr 18 '16 at 12:01
  • But in all the other methods you've still got a list with the initial capacity of at least `originalList.Count`. – Johnbot Apr 18 '16 at 12:04
  • @Johnbot yeah, I suppose you are right... I guess I should have adjusted in the others too, but frankly, `MethodD` is the only one that creates a list I actually not copied from the original question and wrote myself, so I didn't give it much thought to the others :-) – Jcl Apr 18 '16 at 12:05
  • 1
    The difference in performance between `MethodA` and `MethodB` is most likely due to using `Remove` rather than `RemoveAt`. `Remove` has to search the whole list for the item in question, while with `RemoveAt`, you're telling it exactly where to find it. – Darrel Hoffman Apr 18 '16 at 14:18
  • @DarrelHoffman you are totally spot on there – Jcl Apr 18 '16 at 16:50
1

Another way could be to foreach the list, set the item to null and then do a linq remove to (item => null), as already said, it's up to you to choose the way that suits you better

Pietro
  • 988
  • 10
  • 19