Is there a Scala library/example that will parse a URL/URI into a case class structure for pattern matching?
Asked
Active
Viewed 1.4k times
4 Answers
32
Here's an extractor that will get some parts out of a URL for you:
object UrlyBurd {
def unapply(in: java.net.URL) = Some((
in.getProtocol,
in.getHost,
in.getPort,
in.getPath
))
}
val u = new java.net.URL("http://www.google.com/")
u match {
case UrlyBurd(protocol, host, port, path) =>
protocol +
"://" + host +
(if (port == -1) "" else ":" + port) +
path
}

Michael Lorton
- 43,060
- 26
- 103
- 144

Alex Cruise
- 7,939
- 1
- 27
- 40
-
For bonus points, one could have a sequence extraction for the path. :-) – Daniel C. Sobral Oct 02 '11 at 01:20
8
I would suggest to use the facility provided by extractors for regular expressions.
For instance:
val URL = """(http|ftp)://(.*)\.([a-z]+)""".r
def splitURL(url : String) = url match {
case URL(protocol, domain, tld) => println((protocol, domain, tld))
}
splitURL("http://www.google.com") // prints (http,www.google,com)
Some explanations:
- The
.r
method on strings (actually, onStringLike
s) turns them into an instance ofRegex
. Regex
es define anunapplySeq
method, which allows them to be used as extractors in pattern-matching (note that you have to give them a name that starts with a capital letter for this to work).- The values that are going to be passed into the binders you use in the pattern are defined by the groups
(...)
in the regular expression.

Philippe
- 9,582
- 4
- 39
- 59
-
If I understand correctly, the regex returns Seq[String]? If I need to do more complex matching against query string parameters, I would probably parse those as well and do something like: case class Url(protocol: String, domain: String, tld: String, uri: String, Tuple2[String, String] queryStringParameters)) – Eric Hauser Sep 28 '11 at 17:15
-
@Eric the method above will return whatever is after the `=>`. As defined, it returns `Unit`, but you could change the `println...` to `(protocol, domain, tld)` for a tuple, `Seq(protocol, domain, tld)` if you want a Seq, or your case class if you define one. – Luigi Plinge Sep 28 '11 at 18:03
-
3Using a regex to parse URLs is well into WORLD OF PAIN territory, but extractors are a good technique... – Alex Cruise Sep 28 '11 at 19:53
-
6As the saying goes - `Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.` – Duncan McGregor Sep 28 '11 at 21:03
-
Point taken :) I would argue though that URLs are just before the final frontier of the unregexable. – Philippe Sep 28 '11 at 21:59
4
You can use java's URL which can parse an URL for its different components and is completely Scala compatible.

JaimeJorge
- 1,885
- 16
- 15
2
The following library can help you parse URIs into an instance of a case class. (Disclaimer: it is my own library) https://github.com/theon/scala-uri
You parse like so:
import com.github.theon.uri.Uri._
val uri:Uri = "http://example.com?one=1&two=2"
It provides a DSL for building URLs with query strings:
val uri = "http://example.com" ? ("one" -> 1) & ("two" -> 2)

theon
- 14,170
- 5
- 51
- 74
-
is there a way to check if URLs are valid without having to catch exceptions? E.g. when I try to do `val uri: Uri = "$%&"`, then I get a strange `StringIndexOutOfBoundsException` – ceran Feb 06 '16 at 13:30