Hello for example there is file with content and url's. I want only the content of urls is there any detector in scala.Kindly suggest me any idea.Thanks in advance
3 Answers
For this and many other questions: you could just use solution for Java
.
How to detect the presence of URL in a string.
import java.net.URL; import util.Try
val text = "abc http://stackoverflow.com stackoverflow.com http blah-blah-blah"
text.split{"""\s+"""}.map{ s => Try { new URL(s) } }.flatMap{ _.toOption }
//Array[java.net.URL] = Array(http://stackoverflow.com)
-
You might want to consider using java.net.URI, rather than java.net.URL. URL has some pathological behaviour. For example, methods like hashCode and equals make outgoing network connections. – James_pic May 23 '13 at 15:59
About this topic read Extract URL from string . It doesn't matter in which programming language you want to use, the problem is always the same. I faced to the same challenge in 2011 and I went the way that was posted in the accepted answer (as far as I can remember with a little modification).
I am not sure If I understood you correctly, but you can try writing your own. Look at this post. After creating a correct regular expression you may do sth like this (the code assumes that urls are in different lines than the rest of the content):
val URL = """(http|ftp)://(.*)\.([/a-z]+)""".r
def splitURL(url: String) = url match {
case URL(protocol, domain, tld) => println((protocol, domain, tld))
case _ => ; // skip
}
val f = new File("file.txt")
val lines = scala.io.Source.fromFile(f).getLines()
lines foreach (splitURL)
This is just a hint. You will probably need sth more customized for your particular case.
Edit:
You probably need more advanced regular expression. Have a look at reporter's answer