8

I want to parse anchor link from the html content. /* My HTML Content Sample

<a class="productnamecolor colors_productname" href="http://www.cccxcxc.com/Nautical-Bubble-Romper-p/s15brpnt03.htm">*/
      <a class="productnamecolor colors_productname" href="http://www.dewewe.com/Nautical-Bubble-Romper-p/erewrwer.htm">
    <a class="productnamecolor colors_productname" href="http://www.sdsddsd.com/Nautical-Bubble-Romper-p/dsadadasd.htm"> 

*/ The Anchor have href and i want to get the value of Href . But this is giving me error..

Error: multiple-value s.Attr() in single-value context

package main

    import (
      "fmt"
      "log"

      "github.com/PuerkitoBio/goquery"
    )

    func ExampleScrape() {
      doc, err := goquery.NewDocument("http://www.myurl.com/category-s/1828.htm") 
      if err != nil {
        log.Fatal(err)
      }

    /* **my sample html after http open** <a class="productnamecolor colors_productname" href="http://www.cccxcxc.com/Nautical-Bubble-Romper-p/s15brpnt03.htm">*/
      <a class="productnamecolor colors_productname" href="http://www.dewewe.com/Nautical-Bubble-Romper-p/erewrwer.htm">
    <a class="productnamecolor colors_productname" href="http://www.sdsddsd.com/Nautical-Bubble-Romper-p/dsadadasd.htm"> ***/

    doc.Find("table.v65-productDisplay a.productnamecolor").Each(func(i int, s *goquery.Selection) {
        band := s.Attr("href") // here i want to get attribute " href " value. this is not working here.
        fmt.Printf(band)
      })
    }

    func main() {
      ExampleScrape()
    }
blackgreen
  • 34,072
  • 23
  • 111
  • 129
Naresh
  • 2,761
  • 10
  • 45
  • 78
  • You should clearly ask your question in sentences and paragraphs rather than relying on the title and comments in a code block. – Dave C Aug 23 '15 at 21:11
  • 1
    Is [the `golang.org/x/net/html` package](https://godoc.org/golang.org/x/net/html/) relevant/helpful? – Dave C Aug 23 '15 at 21:12
  • @DaveC i have also tried with this package but problem is this it does not gives the css selectors.. – Naresh Aug 23 '15 at 21:17
  • @PuzzledBoy What do you mean by that? You can every attribute/tag with that package just fine. You just don't get the syntax of `doc.Find("tag.classattr")` although writing your own function do that is actually really easy once you get familiar with that package. – user3591723 Aug 25 '15 at 01:49

2 Answers2

10

Selection.Attr returns two values: the attribute value, and a boolean stating whether the attribute existed or not (the attribute value will be the empty if this is false).

Go doesn't like it when you ignore multiple return values, so you'll have to change your code to the following:

doc.Find("table.v65-productDisplay a.productnamecolor").Each(func(i int, s *goquery.Selection) {
    band, ok := s.Attr("href")
    if ok {
        fmt.Printf(band)
    }
})
3

You can also use the golang.org/pkg/net/html package.

package main

import (
    "fmt"
    "log"
    "strings"

    "golang.org/x/net/html"
)

func main() {
    s := `<a class="productnamecolor colors_productname" href="http://www.cccxcxc.com/Nautical-Bubble-Romper-p/s15brpnt03.htm">*/
      <a class="productnamecolor colors_productname" href="http://www.dewewe.com/Nautical-Bubble-Romper-p/erewrwer.htm">
    <a class="productnamecolor colors_productname" href="http://www.sdsddsd.com/Nautical-Bubble-Romper-p/dsadadasd.htm">`
    doc, err := html.Parse(strings.NewReader(s))
    if err != nil {
        log.Fatal(err)
    }
    var f func(*html.Node)
    f = func(n *html.Node) {
        if n.Type == html.ElementNode && n.Data == "a" {
            for _, a := range n.Attr {
                if a.Key == "href" {
                    fmt.Println(a.Val)
                    break
                }
            }
        }
        for c := n.FirstChild; c != nil; c = c.NextSibling {
            f(c)
        }
    }
    f(doc)
}
// outputs
// http://www.cccxcxc.com/Nautical-Bubble-Romper-p/s15brpnt03.htm
// http://www.dewewe.com/Nautical-Bubble-Romper-p/erewrwer.htm
// http://www.sdsddsd.com/Nautical-Bubble-Romper-p/dsadadasd.htm

Hope this helps someone.

Dami
  • 197
  • 3
  • 8