1

Or in general how to filter some elements from collection based on different and complex conditions in single pass

Let's say we have collection of elements

var cats = new List<Cat>{ new Cat("Fluffy"), new Cat("Meowista"), new Cat("Scratchy")};

And somewhere we use this collection

public CatFightResult MarchBoxing(List<Cat> cats, string redCatName, string blueCatName)
{
    var redCat = cats.First(cat => cat.Name == redCatName);
    var blueCat = cats.First(cat => cat.Name == blueCatName);
    var redValue = redCat.FightValue();
    var blueValue = blueCat.FightValue();
    if (Cat.FightValuesEqualWithEpsilon(redValue, blueValue))
        return new CatFightResult{IsDraw: true};  
    return new CatFightResult{Winner: redValue > blueValue ? redCat : blueCat};
}

Question: Is there a nice way to obtain multiple variables from collection based on some condition(s)? The question probably requires some sort of uniqueness in collection, let's first assume there is some (i.e. HashSet/Dictionary)

AND preferably:

  • SINGLE pass/cycle on collection (the most important reason of question, as you can see there are 2 filter operations in above method)
  • oneliner or like that, with readability, and the shorter the better
  • generic way (IEnumerable<T> I think, or ICollection<T>)
  • typos error-prone and changes/additions safe (minimal use of actual conditions in code, preferably checked
  • null/exception check, because my intention that null is valid result for obtained variable

Would be also cool to have ability to provide custom conditions, which probably could be done via Func parameters, but I didn't tested yet.

There are my attempts, which I've posted in my repo https://github.com/phomm/TreeBalancer/blob/master/TreeTraverse/Program.cs

Here is the adaptation to example with Cats:

public CatFightResult MarchBoxing(List<Cat> cats, string redCatName, string blueCatName)
{                
    var redCat = null;
    var blueCat = null;

    //1 kinda oneliner, but hard to read and not errorprone
    foreach (var c in cats) _ = c.Name == redCatName ? redCat = n : n.Name == blueCatName ? blueCat = n : null;    

    //2 very good try, because errorprone and easy to read (and find mistake in assignment), but not oneliner and not elegant (but fast) redundant fetching and not single pass at all, up to O(N*N) with FirstOrDefault
    var filter = new [] { redCatName, blueCatName }.ToDictionary(x => x.Key, x => cats.FirstOrDefault(n => n.Name == x.Key));
    redCat = filter[redCatName];
    blueCat = filter[blueCatName];

    //3 with readability and ckecks for mistakenly written searching keys (dictionary internal dupe key check) , but not oneliner and not actualy single pass
    var dic = new Dictionary<int, Func<Cat, Cat>> { { redCatName, n => redCat = n }, { blueCatName, n => blueCat = n } };
    cats.All(n => dic.TryGetValue(n.Name, out var func) ? func(n) is null : true);

    //4 best approach, BUT not generic (ofc one can write simple generic IEnumerable<T> ForEach extension method, and it would be strong candidate to win)
    cats.ForEach(n => _ = n.Name == redCatName ? redCat = n : n.Name == blueCatName ? blueCat = n : null);

    //5 nice approach, but not single pass, enumerating collection twice
    cats.Zip(cats, (n, s) => n.Name == redCatName ? redCat = n : n.Name == blueCatName ? blueCat = n : null);

    //6 the one I prefer best, however it's arguable due to breaking functional approach of Linq, causing side effects
    cats.All(n => (n.Name == redCatName ? redCat = n : n.Name == blueCatName ? blueCat = n : null) is null);
}

All the options with ternary op are not extensible easily and relatively error-prone, but are quite short and Linq-ish, they also rely (some trade-off with confusion) on not returning/using actual results of ternary (with discard "_" or "is null" as bool). I think the approach with Dictionary of Funcs is a good candidate to implement custom conditions, just bake-in them with variables.

Thank you, looking forward your solutions ! :)

  • You mean like `cats.Where(x=>x.Name==redCatName|| x.Name==blueCatName)`? – Magnetron Feb 20 '20 at 12:57
  • What would you consider a 'nice' way? You ask about a single pass so it seems you're concerned with performance. On the other hand you ask for a oneliner that is readable. In my experience these 2 requirements often conflict. Also for the oneliner part it's important to know if for example a multiline generic extension method that you can then use as a oneliner is acceptable? –  Feb 20 '20 at 13:01
  • @Magnetron , not actually, I'd prefer to get those in different "named variables" as you can see in MarchBoxing method, in general it is like disassembling collection back to separate elements – Vlad Fomin Feb 20 '20 at 13:27
  • @Knoop , yes, those are conflicting, but as you can see in my different approaches they are quite fitting what is requested, but maybe someone could do it even better ? A generic extension method is appreciated too, but that way we still have to assign different variables on caller's site – Vlad Fomin Feb 20 '20 at 13:31
  • @VladFomin may I ask why? For what I can see, you don't need to know which is the "red cat" and which is the blue one, once you have them both, you can compare and return the winner cat object. And once you have them both, it's cheap to assign them to the "named variables" – Magnetron Feb 20 '20 at 13:34
  • The example is simplified of course, if you have this question , then I haven't explained it clearly.. anyway, as my experience says , there are such situations, and one is provided in the code of Tree traversal algorithm on Github, which I've posted. There is a need to distinguish which node (searched by number/Id) is starting node and which is ending node, because they are generated in specific order and each one references another. – Vlad Fomin Feb 20 '20 at 13:41
  • yes, you can do this with linq use `where()` , `take()` and `select()` that will iterate once... don't try to reinvent the wheel, because then you will have to test it... – Palcente Feb 20 '20 at 13:48
  • @Magnetron Having them both after filter means you have to write code of assignment,with a need to filter them from new collection doing more operations and possibly mistakes – Vlad Fomin Feb 20 '20 at 13:48
  • If you care about lookup performance, you can store the objects in a `Dictionary` or `KeyedCollection` instead of a `List` – Magnetron Feb 20 '20 at 14:01
  • @VladFomin I disagree with them being fitting. All your options will traverse the entire list atleast once even if the blue and red cat could be the first 2 elements, so depending on where the items you want to find in your list are you're already taking a small to big performance hit. –  Feb 20 '20 at 14:17
  • @Knoop good addition, thank you, as for performance issues there is a place to improve solutions, ofc, and maybe some version with `Where` or `Any` could omit further enumeration if all needed values are already obtained. But the main point is to evade (remade) such things: foreach(var elem in coll) { if (cond1) var1 = elem; if (cond2) var2 = elem; if (condX) varX = elem;} and you mention about omitting further enumeration with if (allFound) break; at the end – Vlad Fomin Feb 20 '20 at 14:32
  • @Magnetron, I don't think, changing collection type will help me improve manipulation with `Values` (not with `Keys`), which is intended, possibly no way to filter things when you rely on properties of objects in `Values` (of `Dictionary`) – Vlad Fomin Feb 20 '20 at 14:39
  • Changing collection type will allow to take your 2nd approach, without the need of the `filter` step, just `redCat = cats[redCatName]; blueCat = cats[blueCatName];` – Magnetron Feb 20 '20 at 15:41
  • @Magnetron , your suggestion is great and ofc obvious for the simplest case, but what if you need something more specific ? I've mentioned preferable feature to pass any condition (which ofc could be of different logic or applied on different properties of elements), and the best way for this are delegates/lambdas, as was discussed and posted in accepted answer. However, thank you in participation ! – Vlad Fomin Mar 16 '20 at 07:06

1 Answers1

0

I'm not sure if it's possible with Linq out of the box but if writing a custom extension once is an option for you, retrieving some values from a collection with arbitrary number of conditions may later be put in pretty concise manner.

For example, you may write something like

var (redCat, blueCat) = cats.FindFirsts(
    x => x.Name == redCatName,
    x => x.Name == blueCatName);

If you introduce the FindFirsts() extension as follows:

public static class FindExtensions
{
    public static T[] FindFirsts<T>(this IEnumerable<T> collection, 
        params Func<T, bool>[] conditions)
    {
        if (conditions.Length == 0)
            return new T[] { };

        var unmatchedConditions = conditions.Length;
        var lookupWork = conditions
            .Select(c => (
                value: default(T),
                found: false,
                cond: c
            ))
            .ToArray();
        foreach (var item in collection) 
        {
            for (var i = 0; i < lookupWork.Length; i++)
            {
                if (!lookupWork[i].found && lookupWork[i].cond(item))
                { 
                    lookupWork[i].found = true;
                    lookupWork[i].value = item;
                    unmatchedConditions--;
                }
            }
            if (unmatchedConditions <= 0)
                break;
        }
        return lookupWork.Select(x => x.value).ToArray();
    }
}

The full demo can be hound here: https://dotnetfiddle.net/QdVJUd

Note: In order to deconstruct the result array (i.e. use var (redCat, blueCat) = ...), you have to define a deconstruction extension. I borrowed some code from this thread to do so.

Dmitry Egorov
  • 9,542
  • 3
  • 22
  • 40
  • If the selection is on the same property you could also do something like this: https://dotnetfiddle.net/OnMVn5 but to be honest I don't know what the OP is actually looking for. –  Feb 20 '20 at 15:59
  • Hello, All ! Thank you, Dmitry Egorov , @Knoop – Vlad Fomin Mar 16 '20 at 06:49
  • As I dive into the case I found that what I want is in general breaks CQRS and it can not be done easily with existing instruments, however, your answers gave me the view of hiding this complexity under 2 different options - those are general Enumerating of collection and doing filtering job that Dmitry posted as extension and Deconstucting feature from latest versions of c# which I investigated before, but didn't use, and so , all the combination of techniques works excellent. With a break of CQRS (and Linq) principles mine methods are working as well but are not flexible, though short – Vlad Fomin Mar 16 '20 at 06:55
  • Probably, the one missing requested feature in this approach is the reliability in case code changes, where the variable is not bound to it's querying condition, and possibly leads to mistakes, but my methods are not much closer to this goal (only that the condition is visually near the var, but that's not protecting from mistakes). Anyway Thank you all, guys, and happy march cat fights :) – Vlad Fomin Mar 16 '20 at 07:25