-3

Hello for example there is file with content and url's. I want only the content of urls is there any detector in scala.Kindly suggest me any idea.Thanks in advance

user2401547
  • 131
  • 1
  • 10

3 Answers3

2

For this and many other questions: you could just use solution for Java.

How to detect the presence of URL in a string.

import java.net.URL; import util.Try

val text = "abc http://stackoverflow.com stackoverflow.com http blah-blah-blah"

text.split{"""\s+"""}.map{ s => Try { new URL(s) } }.flatMap{ _.toOption }
//Array[java.net.URL] = Array(http://stackoverflow.com)
Community
  • 1
  • 1
senia
  • 37,745
  • 4
  • 88
  • 129
  • You might want to consider using java.net.URI, rather than java.net.URL. URL has some pathological behaviour. For example, methods like hashCode and equals make outgoing network connections. – James_pic May 23 '13 at 15:59
0

About this topic read Extract URL from string . It doesn't matter in which programming language you want to use, the problem is always the same. I faced to the same challenge in 2011 and I went the way that was posted in the accepted answer (as far as I can remember with a little modification).

Community
  • 1
  • 1
Reporter
  • 3,897
  • 5
  • 33
  • 47
0

I am not sure If I understood you correctly, but you can try writing your own. Look at this post. After creating a correct regular expression you may do sth like this (the code assumes that urls are in different lines than the rest of the content):

val URL = """(http|ftp)://(.*)\.([/a-z]+)""".r
def splitURL(url: String) = url match {
  case URL(protocol, domain, tld) => println((protocol, domain, tld))
  case _ => ; // skip
}

val f = new File("file.txt")
val lines = scala.io.Source.fromFile(f).getLines()

lines foreach (splitURL)

This is just a hint. You will probably need sth more customized for your particular case.

Edit:

You probably need more advanced regular expression. Have a look at reporter's answer

Community
  • 1
  • 1
rarry
  • 3,553
  • 20
  • 23