7

I'm writing a little web crawler, and a lot of the links on sites I'm crawling are relative (so they're /robots.txt, for example). How do I convert these relative URLs to absolute URLs (so /robots.txt => http://google.com/robots.txt)? Does Go have a built-in way to do this?

unor
  • 92,415
  • 26
  • 211
  • 360
hiy
  • 449
  • 5
  • 15

3 Answers3

12

Yes, the standard library can do this with the net/url package. Example (from the standard library):

package main

import (
    "fmt"
    "log"
    "net/url"
)

func main() {
    u, err := url.Parse("../../..//search?q=dotnet")
    if err != nil {
        log.Fatal(err)
    }
    base, err := url.Parse("http://example.com/directory/")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(base.ResolveReference(u))
}

Notice that you only need to parse the absolute URL once and then you can reuse it over and over.

Not_a_Golfer
  • 47,012
  • 14
  • 126
  • 92
5

On top of @Not_a_Golfer's solution.

You can also use base URL's Parse method to provide a relative or absolute URL.

package main

import (
    "fmt"
    "log"
    "net/url"
)

func main() {
    // parse only base url
    base, err := url.Parse("http://example.com/directory/")
    if err != nil {
        log.Fatal(err)
    }

    // and then use it to parse relative URLs
    u, err := base.Parse("../../..//search?q=dotnet")
    if err != nil {
        log.Fatal(err)
    }

    fmt.Println(u.String())
}

Try it on Go Playground.

KenanBek
  • 999
  • 1
  • 14
  • 21
1

I think you are looking for ResolveReference method.

import (
    "fmt"
    "log"
    "net/url"
)

func main() {
    u, err := url.Parse("../../..//search?q=dotnet")
    if err != nil {
        log.Fatal(err)
    }
    base, err := url.Parse("http://example.com/directory/")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(base.ResolveReference(u))
}
// gives: http://example.com/search?q=dotnet

I use it for my crawler as well and works like a charm!

Iman Mirzadeh
  • 12,710
  • 2
  • 40
  • 44