1

The List<T>.RemoveAll is a quite useful method, that allows to remove efficiently multiple items from a list. Unfortunately in some scenarios I needed some extra features that the method doesn't have, and some guarantees that the documentation doesn't provide. It also has a questionable behavior in case the match predicate fails, that causes me anxiety. So in this question I am asking for an implementation of the same method, in the form of an extension method, with these features and characteristics:

  1. Instead of a Predicate<T> it accepts a Func<T, int, bool> delegate, where the int is the zero-based index of the T item.
  2. It guarantees that the predicate will be invoked exactly once for each item, in a stricly ascending order.
  3. In case the predicate returns true for some items and then fails for another item, the items that have been elected for removal are removed from the list before the propagation of the exception.

Here is the signature of the extension method that I am trying to implement:

public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate);

It returns the number of elements that were removed.

I attempted to implement it using as starting point the existing implementation, but it has some performance optimizations that make it quite complex, and injecting the desirable "exceptional" behavior is not obvious. I am interested for an implementation that is simple and reasonably efficient. Using LINQ in the implementation is not desirable, because it implies memory allocations that I would like to avoid.


Context: I should demonstrate the behavior of the built-in List<T>.RemoveAll method, and explain why I don't like it. In case the match predicate fails for an item in the middle of the list, the items that have already been elected for removal are either not removed, or they are replaced with duplicates of other elements. In all cases the list retains its original size. Here is a minimal demo:

List<int> list = new(Enumerable.Range(1, 15));
Console.WriteLine($"Before RemoveAll: [{String.Join(", ", list)}]");
try
{
    list.RemoveAll(item =>
    {
        if (item == 10) throw new Exception();
        bool removeIt = item % 2 == 1;
        if (removeIt) Console.WriteLine($"Removing #{item}");
        return removeIt;
    });
}
catch (Exception ex) { Console.WriteLine(ex); }
finally
{
    Console.WriteLine($"After RemoveAll: [{String.Join(", ", list)}]");
}

The list has 15 numbers, and the intention is to remove the odd numbers from the list. The predicate fails for the 10th number.

Output:

Before RemoveAll: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
Removing #1
Removing #3
Removing #5
Removing #7
Removing #9
System.Exception: Exception of type 'System.Exception' was thrown.
   at Program.<>c.<Main>b__0_0(Int32 item)
   at System.Collections.Generic.List`1.RemoveAll(Predicate`1 match)
   at Program.Main()
After RemoveAll: [2, 4, 6, 8, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

Online demo.

As you can see the numbers 1 and 3 have been removed, the 5, 7 and 9 are still there, and the numbers 6 and 8 have been duplicated (there are two occurrences of each). On the contrary the output that I expected to see is:

After RemoveAll: [2, 4, 6, 8, 10, 11, 12, 13, 14, 15]

This would be a reasonable and predictable behavior I could count on. It keeps the levels of danger in a manageable level. I am not risking, for example, duplicating items in a virtual shopping cart, or printing twice some PDF documents from a selection. The existing behavior stretches a bit too much my comfort levels.

I have reported this behavior to Microsoft, and the feedback that I've got is that in case of failure the outcome is undefined. From their point of view there is no difference between the two above outputs (the actual and the expected). Both are equally corrupted, because both represent a state that is neither the original nor the final/correct state after a successful execution. So they don't think that there is any bug that needs to be fixed, and doing changes that could potentially affect negatively the performance of successful executions is not justified. They also believe that the existing behavior is not surprising or unexpected, so there is no reason to document it.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • I would debate your requirement #3: removing some items but then throwing also leaves the list in some intermediate state (though not as corrupt as the current RemoveAll behaviour). Wouldn't it be cleaner to not touch the list in this case, so you get sort of transactional behaviour? – Klaus Gütter Jan 06 '23 at 07:56
  • What about splitting into two methods: `int[] GetIndexes(this List list, Func predicate);` (does not modify the state of the list) and `void RemoveAllIndexes(this List list, int[] indexes);` – Klaus Gütter Jan 06 '23 at 07:59
  • @KlausGütter in order to implement transactional behavior you'll have to take a snapshot of the list before starting to remove items. This would be a big allocation for large lists, and a significant performance hit. If you want you could post a transactional implementation as an answer, and I will likely upvote it, but it's unlikely that I'll accept it. – Theodor Zoulias Jan 06 '23 at 08:29

4 Answers4

1

This solution is based on the idea to separate the selection of the items to be removed from the removal itself.

This has the following advantages:

  • If during the selection process, an exception occurs, the list will be left untouched
  • The removal process can only fail in catastrophic cases (OutOfMemoryException etc.)

But of course also some disadantages:

  • it requires extra memory to store the intermediate selection result
  • some optimizations might not be as effective

Because of the mentioned optimizations, I chose to base the selection result on ranges instead of individual indexes, so we can use List.RemoveRange which if more effective than individual RemoveAt calls (assumed that there are in fact ranges with more than one element).

public static List<(int start, int count)> GetIndexRanges<T>(this List<T> list, 
    Func<T, int, bool> predicate)
{
    var result = new List<(int start, int count)>();
    int start = -1;
    for (var i = 0; i < list.Count; i++)
    {
        // see note 1 below
        bool toBeRemoved = predicate(list[i], i);
        if (toBeRemoved)
        {
            if (start < 0)
                start = i; // new range starts
        }
        else if (start >= 0)
        {
            // range finished
            result.Add((start, i - start));
            start = -1;
        }
    }
    if (start >= 0)
    {
        // orphan range at the end
        result.Add((start, list.Count - start));
    }
    return result;
}

public static int RemoveIndexRanges<T>(this List<T> list, 
    List<(int start, int count)> ranges)
{
    var removed = 0;
    foreach (var range in ranges)
    {
        // the "- removed" is there to take into account 
        // that deletion moves the indexes.
        list.RemoveRange(range.start - removed, range.count);
        removed += range.count;
    }
    return removed;
}

Usage:

var ranges = list.GetIndexRanges((item, index) =>
    {
        //if (item == 10) throw new Exception();
        return item % 2 == 1;
    });
// See note 2 below
list.RemoveIndexRanges(ranges);

Note 1: As is, an exception in the predicate would just be propagated during the selection process, with no change to the ecollection. To give the caller more control over this, the following could be done: extend GetIndexRanges to still return everything collected so far, and in addition also return any exception as out parameter:

public static List<(int start, int count)> GetIndexRanges<T>(this List<T> list, 
    Func<T, int, bool> predicate, out Exception exception)
{
    var result = new List<(int start, int count)>();
    int start = -1;
    for (var i = 0; i < list.Count; i++)
    {
        bool toBeRemoved = false;
        try 
        { 
            toBeRemoved = predicate(list[i], i); 
        }
        catch (Exception e) 
        { 
            exception = e;
            break; // omit this line to continue with the selection process
        }
        if (toBeRemoved)
        {
            if (start < 0)
                start = i; // new range starts
        }
        else if (start >= 0)
        {
            // range finished
            result.Add((start, i - start));
            start = -1;
        }
    }
    if (start >= 0)
    {
        // orphan range at the end
        result.Add((start, list.Count - start));
    }
    return result;
}

var ranges = list.GetIndexRanges((item, index) =>
    {
        if (item == 10) throw new Exception();
        return item % 2 == 1;
    }, out var exception);

// to fulfil requirement #3, we remove the ranges collected so far
// even in case of an exception
list.RemoveIndexRanges(ranges);

// and then throw the exception afterwards
if (exception != null) 
    ExceptionDispatchInfo.Capture(exception).Throw();

Note 2: As this is now a two-step process, it will fail if the list changes between the calls.

Klaus Gütter
  • 11,151
  • 6
  • 31
  • 36
  • 1
    This doesn't do what OP wants (if `predicate()` throws, no ranges-to-remove are returned at all), and the operation isn't atomic anymore (one could modify the list between calling `GetIndexRanges()` and `RemoveIndexRanges()`). – CodeCaster Jan 06 '23 at 12:06
  • 1
    Thank you for the comment. I extended the answer regarding the exception handling. You are right regarding atomicity. – Klaus Gütter Jan 06 '23 at 12:16
  • 1
    Also, multiple `RemoveRange()` calls incur multiple moves/resizes, possibly inefficient. – CodeCaster Jan 06 '23 at 12:25
  • @CodeCaster As far as I can see, there is no way to avoid this inefficiency when we can only use the public API of `List`. – Klaus Gütter Jan 06 '23 at 12:28
  • Thanks Klaus for the answer. What is the point of splitting the `RemoveAll` to `GetIndexRanges`+`RemoveIndexRanges`? Do you anticipate that any of those can be usefull independently? If not, then why aren't you just combine them to one? Also if I understand correctly your suggestion `try { toBeRemoved = predicate(list[i], i); } catch { }` will swallow the exception, and partial ranges will be returned, without the caller being informed about it. This is not what I want. Also the worse-case performance of this solution is probably far worse than the built-in `RemoveAll`. – Theodor Zoulias Jan 06 '23 at 13:34
  • @TheodorZoulias The idea is to have one method which might throw, but not alter the state and another which alters the state but will not throw in order to get transactional behavior (except for catastrophic errors). – Klaus Gütter Jan 06 '23 at 15:20
  • I extended the Note 1 to provide something nearer to your requirement #3 – Klaus Gütter Jan 06 '23 at 15:29
  • I understand the transactional idea, but if the two method are intended to be called in tandem, why not combine them to one method, with the name `RemoveAll`? – Theodor Zoulias Jan 06 '23 at 15:51
  • Of course you can combine them. I just thought that the decision what to do if an exception occurred in the selection phase would be better located at the call site. – Klaus Gütter Jan 06 '23 at 16:19
  • My expectation about an exception in the selection phase is the direct propagation of the exception to the caller. Which is what C# does by default. The remove phase will not be invoked. – Theodor Zoulias Jan 06 '23 at 16:22
  • This would then be the version without Note1 – Klaus Gütter Jan 06 '23 at 17:00
1

I think that I've managed to come up with an implementation that satisfies all three requirements:

/// <summary>
/// Removes all the elements that match the conditions defined by the specified
/// predicate. In case the predicate fails, the integrity of the list is preserved.
/// </summary>
public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate)
{
    ArgumentNullException.ThrowIfNull(list);
    ArgumentNullException.ThrowIfNull(predicate);

    Span<T> span = CollectionsMarshal.AsSpan(list);
    int i = 0, j = 0;
    try
    {
        for (; i < span.Length; i++)
        {
            if (predicate(span[i], i)) continue;
            if (j < i) span[j] = span[i];
            j++;
        }
    }
    finally
    {
        if (j < i)
        {
            for (; i < span.Length; i++, j++)
                span[j] = span[i];
            list.RemoveRange(j, span.Length - j);
        }
    }
    return i - j;
}

For better performance it uses the CollectionsMarshal.AsSpan method (.NET 5) to get a Span<T> out of the list. The algorithm works just as well by using the indexer of the list instead of the span, and replacing the span.Length with list.Count.

Online demo.

I haven't benchmark this implementation, but I expect it to be only marginally slower than the native implementation.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
0

So they don't think that there is any bug that needs to be fixed. They also believe that this behavior is not surprising or unexpected, so there is no need to document it.

They're correct. The method is documented as:

Removes all the elements that match the conditions defined by the specified predicate.

This supports two scenarios: the predicate returning true, removing an element, or false for leaving it as-is. A predicate throwing an exception is not a use case intended to be supported.

If you want to be able to pass a predicate that may throw, you could wrap it like this:

public static int RemoveAll<T>(this List<T> list, Func<T, int, bool> predicate)
{
    Exception? caught = null;
    int index = 0;
    int removed = 0;

    list.RemoveAll(item =>
    {
        // Ignore the rest of the list once thrown
        if (caught != null) return false;

        try
        {
            var remove = predicate(item, index);
            if (remove)
            {
                removed++;
            }

            return remove;
        }
        catch (Exception e)
        {
            caught = e;
            return false;
        }

        index++;
    });

    if (caught != null)
    {
        throw caught;
    }

    return removed;
}
CodeCaster
  • 147,647
  • 23
  • 218
  • 272
  • Thanks CodeCaster for the answer. Catching and deferring the throw of the exception is quite clever, I don't know why I didn't thought about it. What I found problematic in this solution is that it is based on the assumption that the built-in `RemoveAll` invokes the `match` once for each element, in an ascending order. As far as I can see this behavior is not documented explicitly, so in theory it could change in a future version of the .NET runtime. So it doesn't satisfy the 2nd requirement in my question. Also a minor flaw is that the stack trace of the rethrown exception is lost. – Theodor Zoulias Jan 06 '23 at 13:51
  • *"They're correct."* -- I know that my question provokes the expression of opinions, and you are free to express yours, but if I had asked peoples' opinions about the behavior of the built-in API, my question most likely would be closed as opinion-based. It's a pity that Microsoft has the policy of freezing automatically all GitHub issues one month after they are closed and inactive, so people can't express their opinions without going through the hassle of opening new follow-up issues. IMHO this is a bad policy, because it results in Microsoft receiving less feedback about their products. – Theodor Zoulias Jan 06 '23 at 14:05
  • Right on account of the code criticism, the `index` exposes an implementation detail and doesn't guarantee what you want - but you can't, without iterating yourself, losing the built-in optimization. You can just use the `try-catch` clause without the extension method if you're willing to drop the index requirement. For the throw, see https://stackoverflow.com/questions/57383/. For the opinion: yes, that last paragraph smells of frustration, so I thought I'd respond to that. – CodeCaster Jan 06 '23 at 14:19
  • I am hoping that I can mimic/modify the built-in implementation, using the public API of the `List`. The complexity would stay optimal O(n), with just some added overhead caused by parameter checking and incrementing the `_version` field, with which I am OK. That's the direction that I am working at the moment. – Theodor Zoulias Jan 06 '23 at 14:28
  • Each `Remove(Range)()` involves an array copy though, which `RemoveAll()` only does once. – CodeCaster Jan 06 '23 at 14:58
  • The [built-in implementation](https://github.com/dotnet/runtime/blob/b0601cfa4a0366f271f8f3f041d40673243a6886/src/libraries/System.Private.CoreLib/src/System/Collections/Generic/List.cs#L924) doesn't use the `Remove`. It uses the get/set indexer of the backing `_items` array. I should be able to use the get/set indexer of the list itself in the same way. – Theodor Zoulias Jan 06 '23 at 15:48
  • Not using the public API. – CodeCaster Jan 06 '23 at 16:07
  • 1
    I might prove you wrong if/when I have my implementation ready. :-) – Theodor Zoulias Jan 06 '23 at 16:13
  • I just noticed that the line `index++;` produces a warning *"Unreachable code detected"*, and the `index` argument is always `0` ([demo](https://dotnetfiddle.net/RlWrG8)). I haven't downvoted the answer btw. It's a decent answer, I don't know why it was downvoted. – Theodor Zoulias Mar 12 '23 at 10:21
  • 1
    @Theodor whoops. Looks like that should be in a `finally` block, I'll edit later. – CodeCaster Mar 12 '23 at 10:23
-2

I don't know microsoft is how to wrote this method.

I tried some code block. And i found case.

Actually problem is your throw new Exception(). If you dont this code that time yo code will run perfect. Exception trigger some another case. But i dont know what is that.

if (item >= 10) return false;
bool removeIt = item % 2 == 1;
if (removeIt) Console.WriteLine($"Removing #{item}");
return removeIt;

I found this. EDIT

Actually Func<T, int, bool> property is not deleted some item. It return boolean. As if return true he succesful deleted from list. If return false. it is not deleted from list.

Emin Niftiyev
  • 197
  • 1
  • 13
  • Thanks Emin for your answer, but the code in my question has demonstration purposes. In the actual code that uses the `RemoveAll` method I am calling API's that are not under my direct control, and so I can't prevent them from throwing exceptions. Even if I `catch` the exception inside the `match` predicate, what value should I return, `true` or `false`? Non of them makes sense. The exception should be propagated. – Theodor Zoulias Jan 06 '23 at 06:13
  • yes. it is correct. You cant use throw exception. – Emin Niftiyev Jan 06 '23 at 06:21
  • @TheodorZoulias personally, I wouldn't call APIs and do complex logic *inside* the `match` predicate, rather, pre-fetch all the data required and then process it once you've successfully done that – MindSwipe Jan 06 '23 at 06:24
  • @MindSwipe would you also do the same if the `List.RemoveAll` had the behavior that I am asking in my question, and that behavior was documented? – Theodor Zoulias Jan 06 '23 at 06:29
  • @TheodorZoulias yes, because I don't like mixing the fetching of data with the processing of said data, IMO it makes it easier to reason about the code, as well as making it a lot easier to test and reuse the code. – MindSwipe Jan 06 '23 at 06:42
  • @MindSwipe I don't think that calling the `RemoveAll` qualifies as data processing. It removes elements for the list in the same sense that the `Add`/`AddRange` adds elements in the list. Maybe the [`List.ForEach`](https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.foreach) could be said that facilitates data processing, since the items might be mutated inside the `action`. – Theodor Zoulias Jan 06 '23 at 06:49
  • 1
    The point of OP's code is that they have code that does sometimes throw an exception. You can't say "don't throw exceptions", because that invalidates the premise of this question. – CodeCaster Jan 06 '23 at 12:03