You can use Jsoup (http://jsoup.org/).
I do this in Scala but it's the same in Java (it's originally meant for Java).
For e.g.
String connection = Jsoup.connect(url)
.followRedirects(false) // otherwise you'll get into a loop
.timeout(3000) // also loop
.userAgent("Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36") // just copied from Google
.referrer("http://www.google.com")
.get()
This is just to get the html page, then you can parse it easily with the next variables.
I also added next to the url -> (if (url.startsWith("http://") || url.startsWith("https://) url else "http://" + url)
but you don't have to if you know all urls are valid
Then make another variable:
String url = connection
.getElementsByAttributeValueContaining("href", "facebook.com")
.iterator()
.toList
.map(x => x.attr("href"))
for example, you can use any other url you're looking for in the html page (the second param is a regex, it will find anything with that field that contains the regex)
when you do the iterator it takes all fields that matched your regex searched and will bring whichever field you will ask for, here I asked for the href but you can ask for any other field
or you can also use
String url = connection
.getElementsByAttributeValueMatching("type", "rss|atom")
.iterator()
.toList
.map(x => x.attr("href"))
this one is if you're looking for a specific match (the second param is also a regex here, it will find anything that matches exactly the regex you wrote), when you do the iterator it takes all fields that matched your regex searched and will bring whichever field you will ask for, here I asked for the href but you can ask for any other field
Hope this helps...