3

I am working with a grails app. I need to extract only part of the url up to .com (or gov, edu, mil, org, net, etc.) from a string.

For example:

Input: https://stackoverflow.com/questions?=34354#es4 Output: https://stackoverflow.com/

Input: https://code.google.com/p/crawler4j/issues/detail?id=174 Output: https://code.google.com/

Can anyone suggest how it can be done? Also, if it can be done, I need to change https to http in the resulting string. Please help. Thanks.

Edit: I apologize to all the downvoters that I did not include the thing that I tried. This is what i tried:

URL url = new URL(website);
String webUrl = url.getprotocol()+"://"+url.getAuthority()

But I got the following error: MissingPropertyException occurred when processing request: [POST] /mypackage/resource/crawl

Community
  • 1
  • 1
clever_bassi
  • 2,392
  • 2
  • 24
  • 43
  • 2
    Have you tried [`java.net.URI`](http://stackoverflow.com/questions/9607903/get-domain-name-from-given-url)? – Will Jul 09 '14 at 18:46
  • 1
    I am not sure why the down votes are accumulating on this question. The question describes a specific problem that @ayushi apparently doesn't know how to solve. The solution is simple but not necessarily obvious so it seems a reasonable question. What is the motivation for the down votes on the question? – Jeff Scott Brown Jul 09 '14 at 19:34
  • 1
    @JeffScottBrown Again, I suppose this is the case: *Use your downvotes whenever you encounter an egregiously sloppy, **no-effort-expended post**, or an answer that is clearly and perhaps dangerously incorrect.* Taken from [excerpts](http://stackoverflow.com/help/privileges/vote-down). – dmahapatro Jul 09 '14 at 19:37
  • 1
    Personally I think the down votes are unwarranted in this case. The question seems perfectly reasonable to me. – Jeff Scott Brown Jul 09 '14 at 20:47

3 Answers3

3

Something like this satisfies the 2 examples given:

def url = new URL('http://stackoverflow.com/questions?=34354#es4')
def result = 'http://' + url.host +'/'
assert result == 'http://stackoverflow.com/'

def url2 = new URL('https://code.google.com/p/crawler4j/issues/detail?id=174')
def result2 = 'http://' + url2.host +'/'
assert result2 == 'http://code.google.com/'

EDIT:

Of course you can abbreviate the concatenation with something like this:

def url = new URL('http://stackoverflow.com/questions?=34354#es4')
def result = "http://${url.host}/"
assert result == 'http://stackoverflow.com/'

def url2 = new URL('https://code.google.com/p/crawler4j/issues/detail?id=174')
def result2 = "http://${url2.host}/"
assert result2 == 'http://code.google.com/'
Jeff Scott Brown
  • 26,804
  • 2
  • 30
  • 47
  • Thanks a lot Jeff. I have never used regular expressions. This really helped. – clever_bassi Jul 09 '14 at 20:53
  • 1
    I am sorry that your question was down voted. The reasons cited "This question does not show any research effort; it is unclear or not useful" don't seem to apply here. The question is clear, useful and I don't think the question indicates that you necessarily didn't do any research or put any effort into it. SO is so frustrating. Makes me crazy that folks want to do that sort of thing for cases like this one. Best of luck! – Jeff Scott Brown Jul 09 '14 at 20:56
  • I did research but didn't put my effort because I needed an urgent fix due to a release deadline. Thanks for helping. :) – clever_bassi Jul 09 '14 at 21:10
0

I found the error in my code as well. I mistyped getProtocol as getprotocol and it evaded my observation again and again. It should have been:

URL url = new URL(website);
String webUrl = url.getProtocol()+"://"+url.getAuthority()

Thanks everyone for helping.

clever_bassi
  • 2,392
  • 2
  • 24
  • 43
0

You can try

​String text = 'http://stackoverflow.com/questions?=34354#es4'
def parts = text.split('.com')
return parts[0] + ".com"

This should solve your problem

Nahush Farkande
  • 5,290
  • 3
  • 25
  • 35