10

I'm learning F# and I've started to play around with both sequences and match expressions.

I'm writing a web scraper that's looking through HTML similar to the following and taking the last URL in a parent <span> with the paging class.

<html>
<body>
    <span class="paging">
        <a href="http://google.com">Link to Google</a>
        <a href="http://TheLinkIWant.com">The Link I want</a>
    </span>
</body>
</html>

My attempt to get the last URL is as follows:

type AnHtmlPage = FSharp.Data.HtmlProvider<"http://somesite.com">

let findMaxPageNumber (page:AnHtmlPage)= 
    page.Html.Descendants()
    |> Seq.filter(fun n -> n.HasClass("paging"))
    |> Seq.collect(fun n -> n.Descendants() |> Seq.filter(fun m -> m.HasName("a")))
    |> Seq.last
    |> fun n -> n.AttributeValue("href")

However I'm running into issues when the class I'm searching for is absent from the page. In particular I get ArgumentExceptions with the message: Additional information: The input sequence was empty.

My first thought was to build another function that matched empty sequences and returned an empty string when the paging class wasn't found on a page.

let findUrlOrReturnEmptyString (span:seq<HtmlNode>) =
    match span with 
    | Seq.empty -> String.Empty      // <----- This is invalid
    | span -> span
    |> Seq.collect(fun (n:HtmlNode) -> n.Descendants() |> Seq.filter(fun m -> m.HasName("a")))
    |> Seq.last
    |> fun n -> n.AttributeValue("href")

let findMaxPageNumber (page:AnHtmlPage)= 
    page.Html.Descendants()
    |> Seq.filter(fun n -> n.HasClass("paging"))
    |> findUrlOrReturnEmptyStrin

My issue is now that Seq.Empty is not a literal and cannot be used in a pattern. Most examples with pattern matching specify empty lists [] in their patterns so I'm wondering: How can I use a similar approach and match empty sequences?

Guy Coder
  • 24,501
  • 8
  • 71
  • 136
JoshVarty
  • 9,066
  • 4
  • 52
  • 80
  • Just use an `if .. else` here; `match` is just complicating things. (`if Seq.isEmpty span then "" else ...`) – ildjarn Aug 11 '16 at 22:47
  • The example has been simplified, there are a few places in my pipeline when I'd have to start adding `if-else`. Since I'm new to F# I'm mostly wondering if there is a proper way to match empty sequences since it seems common to match empty sequences. – JoshVarty Aug 11 '16 at 22:49
  • 2
    If it's common in _your_ code and you are adamant on sticking with `match` then create an active pattern for it. – ildjarn Aug 11 '16 at 22:50

4 Answers4

15

The suggestion that ildjarn gave in the comments is a good one: if you feel that using match would create more readable code, then make an active pattern to check for empty seqs:

let (|EmptySeq|_|) a = if Seq.isEmpty a then Some () else None

let s0 = Seq.empty<int>

match s0 with
| EmptySeq -> "empty"
| _ -> "not empty"

Run that in F# interactive, and the result will be "empty".

rmunn
  • 34,942
  • 10
  • 74
  • 105
13

You can use a when guard to further qualify the case:

match span with 
| sequence when Seq.isEmpty sequence -> String.Empty
| span -> span
|> Seq.collect (fun (n: HtmlNode) ->
    n.Descendants()
    |> Seq.filter (fun m -> m.HasName("a")))
|> Seq.last
|> fun n -> n.AttributeValue("href")

ildjarn is correct in that in this case, an if...then...else may be the more readable alternative, though.

TeaDrivenDev
  • 6,591
  • 33
  • 50
  • 1
    This shouldn't compile. `span` is of type `seq` and if `String.Empty` were defined, which it isn't as far as I know, it certainly should be of type string. Did you maybe mean to indent the pipeline? Otherwise, use a one-line early-out like `if Seq.isEmpty span then "" else` to allow opening a branch without requiring deeper indentation. – Vandroiy Aug 12 '16 at 08:54
  • That is correct; it doesn't compile. I knew that without setting up the whole thing including the HTML type provider, it wouldn't compile anyway, so I didn't make any effort to fix it. It would have been nicer to have at least the match expression compiling, yes. – TeaDrivenDev Aug 12 '16 at 19:02
  • [String.Empty](https://msdn.microsoft.com/en-us/library/system.string.empty%28v=vs.110%29.aspx) has been defined since .NET 1.0. Provided you `open System` of course. – Joel Mueller Aug 18 '16 at 20:37
5

Use a guard clause

match myseq with
| s when Seq.isEmpty s -> "empty"
| _ -> "not empty"
Ilya Kharlamov
  • 3,698
  • 1
  • 31
  • 33
2

Building on the answer from @rmunn, you can make a more general sequence equality active pattern.

let (|Seq|_|) test input =
    if Seq.compareWith Operators.compare input test = 0
        then Some ()
        else None

match [] with
| Seq [] -> "empty"
| _ -> "not empty"
TheQuickBrownFox
  • 10,544
  • 1
  • 22
  • 35
  • 1
    BTW, you might think that naming the active pattern `Seq` would conflict with the `Seq` module, but it won't. You'd still be able to use `Seq.append` and other functions from the `Seq` module; the compiler will figure it out. Inside a match pattern, the name `Seq` will reference the active pattern; outside a pattern, the name `Seq` will continue to reference the module. – rmunn Aug 12 '16 at 09:16