Implementing a LINQ "chunk by predicate" with O(1) memory, O(n) runtime complexity and no double-enumeration

Question

Examples: Suppose the predicate is i == 0.

Then

[1] -> [(1)]
[0] -> []
[1, 0] -> [(1)]
[0, 1] -> [(1)]
[0, 0] -> []
[1, 1, 0] -> [(1, 1)]
[1, 0, 1] -> [(1), (1)]
[1, 1, 0, 0, 1, 0, 1, 1, 1] -> [(1, 1), (1), (1, 1, 1)]

Basically, returning contiguous subsegments where the predicate is false.

I thought this would work

internal static IEnumerable<IEnumerable<T>> PartitionBy<T>(this IEnumerable<T> source, Func<T, bool> condition)
{
    IEnumerator<T> mover = source.GetEnumerator();

    for (; mover.MoveNext() ; )
    {
        var chunk = mover.MoveUntil(condition);

        if (chunk.Any())
        {
            yield return chunk;
        }
    }          
}

private static IEnumerable<T> MoveUntil<T>(this IEnumerator<T> mover, Func<T, bool> condition)
{
    bool hitCondition = false;

    do
    {
        if (condition(mover.Current))
        {
            hitCondition = true;
        }
        else
        {
            yield return mover.Current;
        }
    }
    while (!hitCondition && mover.MoveNext());
}

but I was seeing that for example with [1, 1, 0] it will return [(1), (1)]. I don't completely understand why. I can make it work if I change

var chunk = mover.MoveUntil(condition);

to have mover.MoveUntil(condition).ToList(); but if possible I'd like to not have to hold any of the subsegments in memory.

I don't understand your expected results, particularly the last one. Also, it would be easier to understand if every number wasn't 1 or 0. Use some 2, 3 etc. — mjwills, Jun 16 '19 at 23:52
@mjwills I had a slight type-o in the last one. Just fixed it. To explain, if you have `[1, 1, 0, 0, 1, 0, 1, 1, 1]` and the predicate is `i == 0`, then you want all subsegments that are separated by one or more `0`s, so those will be the first 2 numbers `(1, 1)`, the 1 in the middle `(1)`, then the last 3 `(1, 1, 1)`. — i cant codez, Jun 17 '19 at 00:31
Is iterating the sequence multiple times acceptable? (I.e. by storing starting position for each inner enumerable) — Alexei Levenkov, Jun 17 '19 at 00:42
@AlexeiLevenkov Ideally not. That's what I meant by "no double-enumeration" in title. — i cant codez, Jun 17 '19 at 00:45
It would be easier to understand if every number wasn't 1 or 0. Use some 2, 3 etc. This is important to understand to see whether `GroupAdjacent` is an option or not. — mjwills, Jun 17 '19 at 01:16

Jämes · Answer 1 · 2019-06-17T12:33:26.513

Firstly, I think you wanted O(n) as memory complexity, as your output length is linearly proportional to the input. As a big fan of functional programming, I have chosen to use a fold (that corresponds to the LINQ function Aggregate in C#).

Basically, we start with an empty collection of collection and a flag that indicates whether the next iteration has to create a new sub-collection (we only know that when the predicate matches, i.e. in the previous iteration). I use a tuple that contains those two elements as an accumulator. I extracted the logic of the Aggregate in a separated function for clarity purpose.

static IEnumerable<IEnumerable<T>> Partition<T>(this IEnumerable<T> a, Func<T, bool> predicate)
{
    // The accumulator is a tuple defined as: (collection, mustCreateNewList)
    return a.Aggregate((new List<List<T>>(), true), (acc, e) => ForEachElement(acc, e, predicate)).Item1;
}

static (List<List<T>>, bool) ForEachElement<T>((List<List<T>>, bool) acc, T e, Func<T, bool> predicate)
{
    var (collection, mustCreateNewList) = acc;

    // The predicate matches, continue to iterate!
    if (predicate(e)) return (collection, true);

    // The previous iteration requests to create a new list
    if(mustCreateNewList) collection.Add(new List<T>());

    // Add the current element to the last list
    collection[collection.Count - 1].Add(e);

    return (collection, false);
}

The initial collection is walked through once (O(n)) and the length of the output has the length of the input in the worst case (O(n)).

Example of call:

var array = new int[] { 1, 1, 0, 0, 1, 0, 1, 1, 1 };
var result = array.Partition(i => i == 0);

I guess I meant "O(1) memory at any given time during the enumeration" — i cant codez, Jun 17 '19 at 14:30
I'm sorry, but that statement does not make sense to me. In a global manner, you request the output to have a size between 0 and n, where n is the number of non matching element according the predicate. That means the the output size is bound to the input size, hence n. The code I provided only allocate one-list/element at each iteration. I don't know if there is what you are referring by "any given time during the enumeration" ? — Jämes, Jun 17 '19 at 21:40

Marcel Gosselin · Answer 2 · 2019-06-17T04:12:48.830

It is possible to stream the results using LINQ calls. The below implementation:

Does not create temporary Lists to reduce memory consumption, I think it would be O(1) for the memory as only one sub-segment is dealt with at a time.
There will be no double enumeration and the predicate will be called exactly once per record.
It would be O(n) for the runtime because like this answer suggests, the GroupBy operation should be O(n) and the other LINQ calls are single-pass operations so should also be O(n).


    public static IEnumerable<IEnumerable<T>> PartitionBy<T>(this IEnumerable<T> a, Func<T, bool> predicate)
    {
        int groupNumber = 0;
        Func<bool, int?> getGroupNumber = skip =>
        {
            if (skip)
            {
                // prepare next group, we don't care if we increment more than once
                // we only want to split groups
                groupNumber++;
                // null, to be able to filter out group separators
                return null;
            }
            return groupNumber;
        };
        return a
            .Select(x => new { Value = x, GroupNumber = getGroupNumber(predicate(x))} )
            .Where(x => x.GroupNumber != null)
            .GroupBy(x => x.GroupNumber)
            .Select(g => g.Select(x => x.Value));       
    }

Implementing a LINQ "chunk by predicate" with O(1) memory, O(n) runtime complexity and no double-enumeration

2 Answers2