18

What is the best way to remove a set from a collection, but still keep the items that were removed in a separate collection?

I have written an extension method that does that, but I think there must be a better way. Here is my function:

public static List<T> FindAndRemove<T>(this List<T> lst, Predicate<T> match)
{
    List<T> ret = lst.FindAll(match);
    lst.RemoveAll(match);
    return ret;
}

And you would use it like this:

List<String> myList = new List<String>();
myList.Add("ABC");
myList.Add("DEF");
myList.Add("ABC");
List<String> removed = myList.FindAndRemove(x => x == "ABC");
// myList now contains 1 item (DEF)
// removed now contains 2 items (ABC, ABC)

I'm not 100% sure what goes on behind the scenes in the FindAll and RemoveAll methods, but I imagine a better way would be to somehow "transfer" items from one list to the other.

Eric
  • 2,098
  • 4
  • 30
  • 44
  • 2
    Your solution is the most efficient. – Kirill Polishchuk Mar 05 '12 at 15:31
  • 2
    You implementation looks fine to me. Copy is a cheap operation in .Net so there is no reason for "transfer" (unless you need some thread/exception safety that an object is never in more than one collection at a time) – adrianm Mar 05 '12 at 15:43
  • I agreee. Using the built in LINQ is the purpose to make life easier.. what goes on behine the scenes will be the best solution chosen by MS. Because you are in C# that is fine to the eye, but VB you would daisy chain your query `return = lst.FindAll(match).RemoveAll(match)` to keep in-line with read the code style. – Piotr Kula Mar 05 '12 at 16:26
  • 1
    @ppumkin The custom extension method isn't necessary though. You can just use a .Where() and make it not equal to what you want and then ToList it and BAM. – Jetti Mar 05 '12 at 16:31

4 Answers4

10

Op's answer is the best out of the proposed and suggested solutions so far. Here are timings on my machine:

public static class Class1
{
    // 21ms on my machine
    public static List<T> FindAndRemove<T>(this List<T> lst, Predicate<T> match)
    {
        List<T> ret = lst.FindAll(match);
        lst.RemoveAll(match);
        return ret;
    }

    // 538ms on my machine
    public static List<T> MimoAnswer<T>(this List<T> lst, Predicate<T> match)
    {
        var ret = new List<T>();
        int i = 0;
        while (i < lst.Count)
        {
            T t = lst[i];
            if (!match(t))
            {
                i++;
            }
            else
            {
                lst.RemoveAt(i);
                ret.Add(t);
            }
        }
        return ret;
    }

    // 40ms on my machine
    public static IEnumerable<T> GuvanteSuggestion<T>(this IList<T> list, Func<T, bool> predicate)
    {
        var removals = new List<Action>();

        foreach (T item in list.Where(predicate))
        {
            T copy = item;
            yield return copy;
            removals.Add(() => list.Remove(copy));
        }

        // this hides the cost of processing though the work is still expensive
        Task.Factory.StartNew(() => Parallel.ForEach(removals, remove => remove()));
    }
}

[TestFixture]
public class Tester : PerformanceTester
{
    [Test]
    public void Test()
    {
        List<int> ints = Enumerable.Range(1, 100000).ToList();
        IEnumerable<int> enumerable = ints.GuvanteSuggestion(i => i % 2 == 0);
        Assert.That(enumerable.Count(), Is.EqualTo(50000));
    }
}
David Peden
  • 17,596
  • 6
  • 52
  • 72
3

I don't agree that it is the most efficient - you are calling the predicate match twice on each element of the list.

I'd do it like this:

    var ret = new List<T>(); 
    var remaining = new List<T>(); 
    foreach (T t in lst) {
        if (match(t)) 
        { 
            ret.Add(t); 
        } 
        else 
        { 
            remaining.Add(t); 
        } 
    }
    lst.Clear();
    lst.AddRange(remaining);
    return ret; 
MiMo
  • 11,793
  • 1
  • 33
  • 48
  • 1
    Iterating through a list once, and doing the least work to achieve the required result. What's not to like? – Neil Moss Mar 05 '12 at 16:26
  • 1
    @ppumkin: The alternative is to create a list to hold the found results and iterate through it afterwards, since you cannot mutate a list in a foreach. – Guvante Mar 05 '12 at 16:31
  • @Neil Moss - I never said its wrong.. I tried to imply its old.. yes. But mostly It not .NET structure. That is all :) -PS Looks like JAVA or C . In any MS.NET Exam this answer is WRONG WRONG WRONG.. not my fault. – Piotr Kula Mar 05 '12 at 16:46
  • 1
    Thanks to DPedan for checking the execution time of my previous proposed solution - it was very slow due to the use of RemoveAt() - now I fixed it. The new solution above is as fast as the original one in the question when the predicate is fast, and around twice as fast when the predicate is slow and so its execution dominates the overall execution time. – MiMo Mar 06 '12 at 13:43
0

What you should be trying to do is partition your original list into two new lists. The implementation should work on any IEnumerable, not just lists, and should assume that the source is immutable. See this post on partitioning: LINQ Partition List into Lists of 8 members. I think MoreLinq has it covered already.

Community
  • 1
  • 1
Phil
  • 42,255
  • 9
  • 100
  • 100
-1

Depending upon the size of your collection, you might want to implement it as a HashSet rather than a List. In sufficiently large collections (how large is "sufficient" has been somewhat dependent on what is in the collection, in my experience), HashSets can be much, much faster at finding items within themselves than Lists.

devstruck
  • 1,497
  • 8
  • 10
  • Not applicable. `HashSet` is fast at `Contains`. That is, finding whether a *specified item* is in a collection. The question is about *iterating* over a collection, finding members that *match a predicate*; doing so does not involve use of `Contains`. – ToolmakerSteve Jan 31 '20 at 18:59