I have this string:
http://www.amazon.com/books-used-books-textbooks/b%3Fie%3DUTF8%26node%3D283155
http://www.amazon.com/gp/site-directory
http://www.amazon.com/gp/goldbox
https://en.wikipedia.org/wiki/A
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:GLRqJLrDZEQJ:https://en.wikipedia.org/wiki/A%252Ba%26gbv%3D1%26%26ct%3Dclnk
https://twitter.com/a%3Flang%3Den
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:4teZIJ7lbgsJ:https://twitter.com/a%3Flang%253Den%252Ba%26gbv%3D1%26%26ct%3Dclnk
http://dictionary.reference.com/browse/a
http://webcache.googleusercontent.com/search%3Fhl%3Den%26biw%26bih%26q%3Dcache:Pn8j0e0faiAJ:http://dictionary.reference.com/browse/a%252Ba%26gbv%3D1%26%26ct%3Dclnk
http://boards.4chan.org/a/
I need to grab all the information upto where the ".com", ".org", or ".net" ends.
The expected output should look like this:
http://www.amazon.com/
https://en.wikipedia.org/
http://dictionary.reference.com/
http://webcache.googleusercontent.com/
http://boards.4chan.org/
So far I've tried a few things:
/(\/)([^\/]+)\Z/
^(http[s]?)(...)\w{3}\
/https?:\/\/[\S]/
None of them worked, so now I'm here. If there's an easier way to do it please let me know. I also need to reject the duplicates if there are any.