19

Is there a Scala library/example that will parse a URL/URI into a case class structure for pattern matching?

Eric Hauser
  • 5,551
  • 3
  • 26
  • 29

4 Answers4

32

Here's an extractor that will get some parts out of a URL for you:

object UrlyBurd {
  def unapply(in: java.net.URL) = Some((
    in.getProtocol, 
    in.getHost, 
    in.getPort,
    in.getPath
  ))
}

val u = new java.net.URL("http://www.google.com/")

u match {
  case UrlyBurd(protocol, host, port, path) => 
    protocol + 
      "://" + host + 
      (if (port == -1) "" else ":" + port) + 
      path
}
Michael Lorton
  • 43,060
  • 26
  • 103
  • 144
Alex Cruise
  • 7,939
  • 1
  • 27
  • 40
8

I would suggest to use the facility provided by extractors for regular expressions.

For instance:

val URL = """(http|ftp)://(.*)\.([a-z]+)""".r

def splitURL(url : String) = url match {
  case URL(protocol, domain, tld) => println((protocol, domain, tld))
}

splitURL("http://www.google.com") // prints (http,www.google,com)

Some explanations:

  • The .r method on strings (actually, on StringLikes) turns them into an instance of Regex.
  • Regexes define an unapplySeq method, which allows them to be used as extractors in pattern-matching (note that you have to give them a name that starts with a capital letter for this to work).
  • The values that are going to be passed into the binders you use in the pattern are defined by the groups (...) in the regular expression.
Philippe
  • 9,582
  • 4
  • 39
  • 59
  • If I understand correctly, the regex returns Seq[String]? If I need to do more complex matching against query string parameters, I would probably parse those as well and do something like: case class Url(protocol: String, domain: String, tld: String, uri: String, Tuple2[String, String] queryStringParameters)) – Eric Hauser Sep 28 '11 at 17:15
  • @Eric the method above will return whatever is after the `=>`. As defined, it returns `Unit`, but you could change the `println...` to `(protocol, domain, tld)` for a tuple, `Seq(protocol, domain, tld)` if you want a Seq, or your case class if you define one. – Luigi Plinge Sep 28 '11 at 18:03
  • 3
    Using a regex to parse URLs is well into WORLD OF PAIN territory, but extractors are a good technique... – Alex Cruise Sep 28 '11 at 19:53
  • 6
    As the saying goes - `Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.` – Duncan McGregor Sep 28 '11 at 21:03
  • Point taken :) I would argue though that URLs are just before the final frontier of the unregexable. – Philippe Sep 28 '11 at 21:59
4

You can use java's URL which can parse an URL for its different components and is completely Scala compatible.

JaimeJorge
  • 1,885
  • 16
  • 15
2

The following library can help you parse URIs into an instance of a case class. (Disclaimer: it is my own library) https://github.com/theon/scala-uri

You parse like so:

import com.github.theon.uri.Uri._
val uri:Uri = "http://example.com?one=1&two=2"

It provides a DSL for building URLs with query strings:

val uri = "http://example.com" ? ("one" -> 1) & ("two" -> 2)
theon
  • 14,170
  • 5
  • 51
  • 74
  • is there a way to check if URLs are valid without having to catch exceptions? E.g. when I try to do `val uri: Uri = "$%&"`, then I get a strange `StringIndexOutOfBoundsException` – ceran Feb 06 '16 at 13:30