20

I would like to write a function which filters a sequence using a predicate but the result should also INCLUDE the first item for which the predicate returns false.

The logic would be something like this, if there was a break keyword in F#

let myFilter predicate s =
    seq {
        for item in s do
            yield item
            if predicate item then
                break
    }

I tried combinations of Seq.takeWhile and Seq.skipWhile, something like this:

Seq.append 
    (Seq.takeWhile predicate s) 
    (Seq.skipWhile predicate s |> Seq.take 1)

...but the problem is that the first item which matches the predicate is lost between the takeWhile and the skipWhile

Also note that the input sequence is lazy so any solution which consumes the sequence and takes decisions afterwards is not viable.

Any ideas?

Thanks!

EDIT: Thanks a LOT for all the answers! I didn't expect so many responses so fast. I will take a look at each of them soon. Now I just want to give a little more context. Consider the following coding kata which implements a shell:

let cmdProcessor state = function
    | "q" -> "Good bye!"
    | "h" -> "Help content"
    | c -> sprintf "Bad command: '%s'" c

let processUntilQuit =
    Seq.takeWhile (fun cmd -> cmd <> "q")

let processor = 
    processUntilQuit
    >> Seq.scan cmdProcessor "Welcome!"

module io =
    let consoleLines = seq { while true do yield System.Console.ReadLine () }

    let display : string seq -> unit = Seq.iter <| printfn "%s" 

io.consoleLines |> processor|> io.display

printf "Press any key to continue..."
System.Console.ReadKey ()|> ignore

This implementation has the trouble that it doesn't print "Good bye!" when command q is entered.

What I want to do is to implement the function processUntilQuit such that it processes all the commands until "q", including "q".

vidi
  • 2,056
  • 16
  • 34
  • 1
    What do you mean by *the problem is that the first item which matches the predicate is lost between the takeWhile and the skipWhile*? Your solution is correct because the first false item is still there in `s`. – pad Sep 24 '12 at 10:53
  • @pad: So if I implement the function processUntilQuit like this: let processUntilQuit cmds = let isNotFinished cmd = cmd <> "q" seq {yield! Seq.takeWhile isNotFinished cmds yield! Seq.skipWhile isNotFinished cmds |> Seq.truncate 1} ...the output looks like this: Welcome! q q Good bye! Press any key to continue... As you can see I had to type q twice to exit. – vidi Sep 24 '12 at 15:43

12 Answers12

19

The lack of support for break in computation expressions is a bit annoying. It does not fit well with the model used by F# (which is why it is not supported), but it would be really useful in this case.

If you want to implement this using just a single iteration over the sequence, then I think the cleanest solution is to just use the underlying structure of sequences and write it as a recursive loop using IEnumerator<'T>

This is fairly short (compared to other solutions here) and it is quite clear code too:

let myFilter predicate (s:seq<_>) = 
  /// Iterates over the enumerator, yielding elements and
  /// stops after an element for which the predicate does not hold
  let rec loop (en:IEnumerator<_>) = seq {
    if en.MoveNext() then
      // Always yield the current, stop if predicate does not hold
      yield en.Current
      if predicate en.Current then
        yield! loop en }

  // Get enumerator of the sequence and yield all results
  // (making sure that the enumerator gets disposed)
  seq { use en = s.GetEnumerator()
        yield! loop en }
Tomas Petricek
  • 240,744
  • 19
  • 378
  • 553
  • This is not really functional but it works and it's clean enough. Thanks! It would still be interesting to see how this can be written in a nice functional way – vidi Sep 24 '12 at 18:13
  • 3
    I do not have a "proof" for this claim, but I think it cannot be written in a "nice functional way" because the underlying data structure - `seq<'T>` is not really functional. You could write it in a functional style if you used lazy list, but sequences are more idiomatic in F# (even if they sometimes force you to use imperative style to implement some higher-level declarative function). – Tomas Petricek Sep 24 '12 at 19:16
  • @TomasPetricek please have a look at my answer. – silvalli Mar 10 '23 at 00:25
  • 1
    @silvalli The problem with your solution is that using `Seq.tail` inside a recursive loop will be inefficient - because it returns a new sequence that iterates over the whole of the original one. – Tomas Petricek Mar 10 '23 at 09:19
  • @TomasPetricek Ew! Isn’t that a bug in Seq.tail? Can it not be fixed? – silvalli Mar 10 '23 at 12:35
  • 1
    @silvalli I'm afraid it is not a bug and it can't be fixed - it is a necessary consequence of the fact that sequences are lazily generated and not cached (you could add caching using `Seq.cache`, but that has other problems) and that the underlying API is imperative. (Any implementation of `Seq.tail` will have to call `GetEnumerator` on the sequence it gets and skip over the first element - but if it gets a sequence returned by previous `Seq.tail`, that too will call `GetEnumerator` on the previous sequence and skip the first element; So after _n_ steps, you'll call `GetEnumerator` n-times!) – Tomas Petricek Mar 10 '23 at 22:20
  • @TomasPetricek Thanks. Too bad. It just seems to me that F# should enable writing efficient code for this without having to explicitly use the .NET library types. – silvalli Mar 11 '23 at 00:04
  • 1
    Well, this is the trade-off of having to exist in a wider ecosystem... If you do not use .NET library types, you'll be fine - you can use lazy list, ordinary list or some custom collection type - but alas, `seq<'T>` is a .NET library type that is quite often needed... – Tomas Petricek Mar 13 '23 at 08:08
  • @TomasPetricek please have another look at my answer :) – silvalli Mar 20 '23 at 07:01
  • @TomasPetricek did you see @rkrahi’s `myFilterLazy2` answer below? Shouldn‘t that perform the same as your `IEnumerator` solution? – silvalli Mar 22 '23 at 01:16
  • 1
    @silvalli That has exactly the same performance problem as a solution based on `Seq.tail` (because it recursively uses `Seq.skip 1`, which is essentially the same thing). – Tomas Petricek Mar 22 '23 at 11:14
5

Don't really get what is the problem with your solution.

Two small corrections:

(1) Use sequence expression for readability.

(2) Use Seq.truncate instead of Seq.take in case the input sequence is empty.

let myFilter predicate s = 
    seq { yield! Seq.takeWhile predicate s
          yield! s |> Seq.skipWhile predicate |> Seq.truncate 1 }
pad
  • 41,040
  • 7
  • 92
  • 166
  • Won't it enumerate s twice in worst case (all s's match predicate)? – rkrahl Sep 24 '12 at 11:47
  • Yes, it will. But I don't think it's OP's concern in the question. – pad Sep 24 '12 at 11:56
  • I do not understand your first point - I think both OP's and your's versions are equally lazy. I would write it in this way too, but that is just stylistic preference... (Good point about `truncate` though.) – Tomas Petricek Sep 24 '12 at 12:22
1
let duplicateHead xs = seq { yield Seq.head xs; yield! xs }
let filter predicate xs =
    xs
    |> duplicateHead
    |> Seq.pairwise
    |> Seq.takeWhile (fst >> predicate)
    |> Seq.map snd

Alternative version of duplicateHead, in case if you don't like computation expression here:

let duplicateHead' xs =
    Seq.append 
        (Seq.head xs)
        xs

This approach is based on building tuples of current and next element. The predicate is being applied to the current element, but the following one is returned.

NOTE: It is not safe for cases when predicate fails on the very first element. In order to make it working fine, you have to re-work duplicateHead by adding an element that would certainly pass the predicate.

Be Brave Be Like Ukraine
  • 7,596
  • 3
  • 42
  • 66
  • It almost works but it reads one more command after the q and, as you already said, there is still a problem if the first command is q – vidi Sep 24 '12 at 18:22
0

Ugly non-functional solution

let myfilter f s =
    let failed = ref false
    let newf = fun elem -> match !failed with 
                           |true -> 
                               failed := f elem
                               true
                           |false->false
    Seq.takeWhile newf s
John Palmer
  • 25,356
  • 3
  • 48
  • 67
  • With this implementation, after I entered command q it reads one more command and then it exits – vidi Sep 24 '12 at 18:32
0

Ugly functional solution :):

let rec myFilter predicate =
        Seq.fold (fun acc s ->
            match acc with
                | (Some x, fs) -> 
                    match predicate s with
                        | true -> (Some x, fs @ [s])
                        | false -> (Some x, fs)
                | (_, fs) ->
                    match predicate s with
                        | true -> (None, fs @ [s])
                        | false -> (Some s, fs))
            (None, [])

You end up with tuple, of which first element contains option with first non-matching element from source list and second element contains filtered list.

Ugly functional lazy solution (sorry, i didn't read your post correctly for the first time):

let myFilterLazy predicate s =
        let rec inner x =
            seq {
                match x with
                    | (true, ss) when ss |> Seq.isEmpty = false ->
                        let y = ss |> Seq.head
                        if predicate y = true then yield y
                        yield! inner (true, ss |> Seq.skip 1)
                    | (_, ss) when ss |> Seq.isEmpty = false ->
                        let y = ss |> Seq.head
                        if predicate y = true then
                            yield y
                            yield! inner (false, ss |> Seq.skip 1)
                        else
                            yield y
                            yield! inner (true, ss |> Seq.skip 1)
                    | _ -> 0.0 |> ignore
            }

        inner (false, s)

I'm not fluent enough in F# to make terminating case in match look good, maybe some of the F# gurus will help.

Edit: Not-so-ugly, pure F# solution inspired by Tomas Petricek answer:

let myFilterLazy2 predicate s =
        let rec inner ss = seq {
            if Seq.isEmpty ss = false then
                yield ss |> Seq.head
                if ss |> Seq.head |> predicate then
                    yield! ss |> Seq.skip 1 |> inner
        }

        inner s
rkrahl
  • 1,159
  • 12
  • 18
  • It deserves +1 for being the ugliest solution :) Thanks but I was hoping for something cleaner. – vidi Sep 24 '12 at 16:13
  • 1
    Thanks for appreciation :). Third version, pure F# without strange .net classes ;). Inspired by Tomas Petricek answer, which is my personal favorite too. – rkrahl Sep 24 '12 at 19:31
  • I've just tried your second solution and doesn't work. It needs the commands entered twice. – vidi Sep 25 '12 at 10:57
  • @rkrahl @TomasPetricek says `MyFilterLazy2` “has exactly the same performance problem as a solution based on Seq.tail (because it recursively uses Seq.skip 1, which is essentially the same thing).” – silvalli Mar 22 '23 at 18:12
0

A bit better one. :)

let padWithTrue n xs = seq { for _ in 1..n do yield true; done; yield! xs }
let filter predicate n xs =
    let ys = xs |> Seq.map predicate |> padWithTrue n
    Seq.zip xs ys
    |> Seq.takeWhile snd
    |> Seq.map fst

This one takes an additional parameter n which defines how many additional elements to add.

NOTE: careful with single-line padWithTrue (done keyword)

Be Brave Be Like Ukraine
  • 7,596
  • 3
  • 42
  • 66
  • This one doesn't work. It needs the commands typed in twice. This is the implementation that I wrote based on your suggestion: let processUntilQuit cmds = let padWithTrue n xs = seq { for _ in 1..n do yield true; done; yield! xs } let ys = cmds |> Seq.map (fun cmd -> cmd <> "q") |> padWithTrue 1 Seq.zip cmds ys |> Seq.takeWhile snd |> Seq.map fst – vidi Sep 24 '12 at 18:11
0

I guess what you want it takeUntil:

let takeUntil pred s =
  let state = ref true
  Seq.takeWhile (fun el ->
    let ret= !state
    state := not <| pred el
    ret
    ) s
Mike Kowalski
  • 781
  • 5
  • 2
0

This is very old but thought I'd contribute because the other solutions did not suggest this...

What about using Seq.scan to establish a two element stack of predicate results and simply take while the bottom of that stack, representing the previous element's predicate result, is true? (note, haven't tested this code)

Seq.scan (fun (a,b,v) e -> (pred e, a, Some e)) (true, true, None )
>> Seq.takeWhile (fun (_,b,_) -> b)
>> Seq.map (fun (_,_,c) -> c)
George
  • 2,451
  • 27
  • 37
0

I get that this is an old question. But there is a more functional solution.

Even though, to be honest, for this question I like the more imperative solution by Tomas Petricek better.

let takeWhileAndNext predicate mySequence =
    let folder pred state element =
        match state with
            | Some (_, false) ->
                None
            | _ ->
                Some (Some element, pred element)
    let initialState = Some (None, true)
    Seq.scan (folder predicate) initialState mySequence |> Seq.takeWhile Option.isSome
                                                        |> Seq.map Option.get
                                                        |> Seq.map fst
                                                        |> Seq.filter Option.isSome
                                                        |> Seq.map Option.get

In the penultimate line, |> Seq.filter Option.isSome may be replaced by |> Seq.tail, as no states other than initialState match Some (None, _).

Adhemar
  • 151
  • 3
0

Another late answer but it is "functional", simple and does not read any elements past the last one in the result sequence.

let myFilter predicate =
    Seq.collect (fun x -> [Choice1Of2 x; Choice2Of2 (predicate x)])
    >> Seq.takeWhile (function | Choice1Of2 _ -> true | Choice2Of2 p -> p)
    >> Seq.choose (function | Choice1Of2 x -> Some x | Choice2Of2 _ -> None)
Max Kiselev
  • 176
  • 1
  • 5
0

I know its ages since the question was asked, but I had to deal with the similar but more generic problem and I hope someone will find my solution useful.

The idea is to catch enumerator in closure and then return a function which iterates through the rest of original sequence. This function has one boolen parameter - whether to include current element (OP's case)

/// allows fetching elements from same sequence
type ContinueSequence<'a> (xs: 'a seq) =

    let en = xs.GetEnumerator()

    member _.Continue (includeCurrent: bool) =
        let s = seq { while en.MoveNext() do yield en.Current }
        let c = seq { en.Current }
        if includeCurrent then
            Seq.append c s
        else
            s

    interface IDisposable with 
        member _.Dispose() =
            en.Dispose()

Actual answer for the question would be:

use seq = new ContinueSequence a 
let result = Seq.append
   seq.Continue(false) |> Seq.takeWhile(predicate) 
   seq.Continue(true) |> Seq.take(1)  //include element which breaks predicate

More generic example

/// usage example:
let a = seq [1; 2; 3; 4; 5; 6; 7]

use seq = new ContinueSequence<_>(a)
let s1 = seq.Continue(false) |> Seq.takeWhile((>) 3) // take 1 and 2, 3 is current
let s2 = seq.Continue(true) |> Seq.take(2)    // take 3 and 4
let s3 = seq.Continue(false) |> Seq.skip(1)   // skip 5

let s = 
    s1 
    |> Seq.append <| s2 
    |> Seq.append <| s3 
    |> Seq.toList

// s = [1; 2; 3; 4; 6; 7]
irriss
  • 742
  • 2
  • 11
  • 22
0

Here is an enhanced takeWhile for a sequence. A bool parameter determines whether the element that fails the predicate is included in the results.

This attempt is based on an answer by Tomas Petricek, and simplified a bit. Alas, doesn’t work correctly. As Tomas points out below, the result sequence doesn’t reset after it’s iterated, so on subsequent uses, the sequence continues to march ahead. (Note to self: another thing to test for when producing a sequence!)

let takeWhile1 predicate inclusive (s : seq<_>) = 
  use en = s.GetEnumerator()
  let rec loop() =
    seq {
      if en.MoveNext() then
        if predicate en.Current then
          yield  en.Current
          yield! loop()
        elif inclusive then
          yield  en.Current }
  loop()

Here’s another attempt, which is horribly inefficient. Tomas explains why it’s inefficient in the comments on his answer.

let rec takeWhile2 predicate inclusive aSeq =
  let rec loop s =
    let getHead x = Seq.head x, Seq.tail x
    seq {
      if not (s |> Seq.isEmpty) then 
        let head, tail = getHead s
        if predicate head then
          yield head
          yield! tail |> loop
        elif inclusive then
          yield head }
  aSeq |> loop

The following is yet another attempt at minimizing the code. This doesn’t compile because “yield may be used only within a list, array, or sequence expression.” Turns out only the code immediately contained in a seq {} can yield, not code in a function called within the seq.

let takeWhile3 predicate inclusive ( s :seq<_>) = 
  seq {
    use en = s.GetEnumerator()
    let rec loop() =
      if en.MoveNext() then
        if predicate en.Current then
          yield  en.Current
          yield! loop()
        elif inclusive then
          yield en.Current
    loop() }

Modifying the above so that the body of loop() is also wrapped in a seq, the result compiles and works correctly. This seems to be as concise as we can get.

let takeWhile4 predicate inclusive ( s :seq<_>) = 
  seq {
    use en = s.GetEnumerator()
    let rec loop() = seq {
      if en.MoveNext() then
        if predicate en.Current then
          yield  en.Current
          yield! loop()
        elif inclusive then
          yield en.Current } 
    yield! loop() }

Here it is again with the proposed fun rec syntax (doesn’t compile):

let takeWhile5 predicate inclusive (s : seq<_>) = 
  seq {
    use en = s.GetEnumerator()
    yield! fun rec loop() -> seq {
      if en.MoveNext() then
        if predicate en.Current then
          yield  en.Current
          yield! loop()
        elif inclusive then
          yield en.Current }

Here is optionally-inclusive takeWhile à la Tomas Petricek’s answer.

let takeWhileP predicate inclusive ( s :seq<_>) = 
  let rec loop (en:IEnumerator<_>) = seq {
    if en.MoveNext() then
      if predicate en.Current then
        yield  en.Current
        yield! loop en
      elif inclusive then
        yield en.Current }
  seq {
    use en = s.GetEnumerator()
    yield! loop en }
silvalli
  • 295
  • 2
  • 13
  • 1
    In your latest version, you still need to run `GetEnumerator` inside the `seq { .. }` block, because otherwise it will not be possible to iterate over the returned sequence twice, but otherwise it should now work & be efficient. – Tomas Petricek Mar 20 '23 at 19:40
  • @TomasPetricek Wow, that’s subtle! Why is that? – silvalli Mar 20 '23 at 21:33
  • @TomasPetricek seems to me that either the first example should work correctly or it should get a compiler error (why?), or at the very least an exception. Btw, I tried explicitly calling en.Dispose before returning, to no avail. – silvalli Mar 22 '23 at 23:06
  • 1
    You can move the `loop` function inside the outer sequence expression, but it's body has to be a nested `seq`, i.e. keep the `let rec loop() = seq { .. }` structure. – Tomas Petricek Mar 23 '23 at 19:30