3

This is an example: originalList is a list of objects

var subList = (originalList.Where(x => x.number < 0)).ToList();
originalList.RemoveAll(x => x.number < 0);

I will use the subList later. In this example the originaList is iterated two times. This function is called billions of times and originalList is a large List

Is there a easy way to improve the performance?


One important thing: The value of number of the object can change between two calls of this function.

Andreas
  • 5,393
  • 9
  • 44
  • 53
  • This feels like it might be an [XY problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) - do you have to use a `List`? A better data structure could be more optimal. – NetMage Jan 31 '22 at 19:58
  • For example, using a `LinkedList` instead of a `List`, I get a `RemoveAllAndReturn` method that runs 50 to 200 times faster then a `List` version, depending on the frequency of removal. – NetMage Jan 31 '22 at 20:23
  • No, i dont have to use a List. Very thanks, i will test now. – Libério Martins Jan 31 '22 at 20:57
  • Note that `LinkedList` takes more space - if this is critical, or time is really critical, it might be more efficient to implement a singly linked list instead. – NetMage Jan 31 '22 at 20:58

3 Answers3

1

An efficiency improvement (though still ultimately O(n)) is to batch the removals together. My testing shows that depending on the frequency of removal, this can be the same speed or over 4 times faster. Here is the function as an extension method:

public static List<T> RemoveAllAndReturn<T>(this List<T> input, Func<T, bool> condition) {
    List<T> result = new List<T>();
    var removeCt = 0;
    for (int i = input.Count - 1; i >= 0; --i) {
        if (condition(input[i])) {
            result.Add(input[i]);
            ++removeCt;
        }
        else if (removeCt > 0) {
            input.RemoveRange(i + 1, removeCt);
            removeCt = 0;
        }
    }
    if (removeCt > 0)
        input.RemoveRange(0, removeCt);
    return result;
}
NetMage
  • 26,163
  • 3
  • 34
  • 55
0

This method removes all elements fulfilling a condition and returns a list of the removed elements. It only iterates once.

public static List<T> RemoveAll<T>(List<T> input, Func<T,bool> condition)
{
    List<T> removedEntries = new List<T>();
    int offset = 0;
    for(int i = 0; i < input.Count - offset; i++)
    {
      while(i < input.Count - offset && condition.Invoke(input[i + offset]))
      {
         removedEntries.Add(input[i + offset]);
         offset++; 
         Console.WriteLine("i="+i+", offset="+offset);
      }
    
      if(i < input.Count - offset)
      {
         input[i] = input[i+offset];
      }
    }
    input.RemoveRange(input.Count - offset, offset);
    return removedEntries;
}

We loop over the list and check if an element matches the condition. If the condition is matched, the element after that element is copied into the position. So all elements that don't fulfill the condition will be at the start of the list, and all elements that fulfill the condition are at the end of the list. In the last step, the elements at the end of the list are removed.

It might be wise to give an initial capacity to the removedEntries list. Per default, a list has a capacity of 4, which is doubled every time it is exceeded. If you have 100 elements to be removed, the capacity has to be extended 5 times. This is an O(n) operation each time. If you can estimate that you will delete about 10% of the elements, you might write

List<T> removedEntries = new List<T>(input.Count / 10);

This might save you some time, but on the other hand, if you don't need the full initial capacity of the list, you waste some memory.

Online demo: https://dotnetfiddle.net/dlthkH

SomeBody
  • 7,515
  • 2
  • 17
  • 33
  • 2
    [`List.RemoveAt`](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.removeat?view=net-6.0#remarks) is an `O(n)` operation, where `n` = Count - index. – Johnathan Barclay Jan 31 '22 at 14:49
  • 2
    OPs code may therefore be more efficient, depending on which / how many elements are being removed. – Johnathan Barclay Jan 31 '22 at 14:59
  • This is not thread-safe. The OP says: *"One important thing: the value of number of the object can change between two calls of this function."*. You might consider using [Thread-Safe Collections](https://learn.microsoft.com/en-us/dotnet/standard/collections/thread-safe/). – Olivier Jacot-Descombes Jan 31 '22 at 15:01
  • 2
    I do not think thread safe collections will help much unless you are adding/removing objects from the list concurrently. The degree of synchronization needed will depend on what guarantees the code need to fulfill, and that is not described clearly in the OP. – JonasH Jan 31 '22 at 15:18
  • 1
    Why are you doing `condition.Invoke(input[i])` instead of `condition(input[i])`? – NetMage Jan 31 '22 at 19:33
  • @JohnathanBarclay I edited the answer to add an improved version which doesn't use RemoveAt. The total algorithm should now be `O(n)`. @NetMage Because I like it more, no special reason. – SomeBody Feb 01 '22 at 06:47
  • Ah yes, that's right.. – Johnathan Barclay Feb 01 '22 at 09:50
  • @SomeBody, you are right. Removed the comment. – Olivier Jacot-Descombes Feb 01 '22 at 13:50
  • Your code has a bug - it is testing one off from copying. Try sample data `-1, -3, 5, 10, 4, -2, 1`. – NetMage Feb 01 '22 at 19:25
  • @NetMage Thanks for the hint. The bug is now fixed. – SomeBody Feb 02 '22 at 06:20
  • I assume you didn't mean to leave the debug `Console.WriteLine` in the code :) Also, since you are accessing via `i + offset` I would suggest it is clearer to write your tests using that expression (e.g. `i + offset < input.Count`). – NetMage Feb 02 '22 at 19:01
  • 1
    Running some tests, this is faster than even using a `LinkedList` when the removal percentage is 13% or over; under 13% removal, a `LinkedList` appears to be faster. – NetMage Feb 02 '22 at 19:06
0

You could consider doing this hack:

List<SomeType> subList = new();
originalList.RemoveAll(item =>
{
    bool shouldBeRemoved = item.Number < 0;
    if (shouldBeRemoved) subList.Add(item);
    return shouldBeRemoved;
});

The Predicate<T> passed to the RemoveAll is not pure: it has the side-effect of inserting matched elements in the subList. Based on the implementation of the RemoveAll method, this hack should work as expected. The documentation though does not make the explicit guarantee that the predicate will be invoked only once per element:

The elements of the current List<T> are individually passed to the Predicate<T> delegate, and the elements that match the conditions are removed from the List<T>.

That's why I call it a hack. It wouldn't be a hack if the current behavior of the RemoveAll method was documented and guaranteed. If you want to be ultra safe you could use a custom RemoveAll implementation that has well defined behavior, like the one found in this answer.

You could also make it an extension method:

public static int RemoveAll<T>(this List<T> source, Predicate<T> match,
    out List<T> removed)
{
    ArgumentNullException.ThrowIfNull(source);
    ArgumentNullException.ThrowIfNull(match);

    List<T> removedLocal = removed = new();
    int removedCount = source.RemoveAll(item =>
    {
        bool shouldBeRemoved = match(item);
        if (shouldBeRemoved) removedLocal.Add(item);
        return shouldBeRemoved;
    });
    Debug.Assert(removedCount == removed.Count);
    return removedCount;
}

Usage example:

originalList.RemoveAll(x => x.number < 0, out var subList);
Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104