0

I have a HTML code as a golang string, out of which I want to extract a particular header, after the last occurence of a pattern. To explain with an example:

    func main() {
    h := `
<html>
 <body>
  <a name="0"> text </a>
  <a name="1"> abc </a>
  <a name="2"> def ghi jkl </a>
  <a name="3"> abc </a>
  <a name="4"> Some text </a>
 </body>
</html>`

    pattern := "abc"

    // Now I want <a name="3"> to be printed. I mean, when someone
    // searches for the pattern abc, the last occurence is the <a>
    // section with the name "3". If the pattern is "def" then "2"
    // should be printed, if the pattern is "text" then 4 should
    // be printed

}

Any idea how I can do this ? I played around with the templates and the Scanner packages but could not get it working.

Sankar
  • 6,192
  • 12
  • 65
  • 89

2 Answers2

0

That depends on what the html input is. You may be able to get away with using regexp, but if you're working with arbitrary html, you're going to have to use a full html parser, such as https://godoc.org/golang.org/x/net/html.

For example, using goquery (which uses x/net/html):

package main

import (
        "fmt"
        "strings"

        "github.com/PuerkitoBio/goquery"
)

func main() {
        h := `
<html>
 <body>
  <a name="0"> text </a>
  <a name="1"> abc </a>
  <a name="2"> def ghi jkl </a>
  <a name="3"> abc </a>
  <a name="4"> Some text </a>
 </body>
</html>`

        pattern := "abc"

        doc, err := goquery.NewDocumentFromReader(strings.NewReader(h))
        if err != nil {
                panic(err)
        }

        doc.Find("a").Each(func(i int, s *goquery.Selection) {
                if strings.TrimSpace(s.Text()) == pattern {
                        name, ok := s.Attr("name")
                        if ok {
                                fmt.Println(name)
                        }
                }
        })

}

EDIT: or instead of the doc.Find part you may be able to use a contains selector depending on your actual input:

// Don't do this if pattern is arbitrary user input
name, ok := doc.Find(fmt.Sprintf("a:contains(%s)", pattern)).Last().Attr("name")
if ok {
        fmt.Println(name)
}
Community
  • 1
  • 1
user1431317
  • 2,674
  • 1
  • 21
  • 18
0

you can use xquery that using XPath, its can simplify your code.

package main

import (
    "fmt"
    "strings"
    "github.com/antchfx/xquery/html"
    "golang.org/x/net/html"
)

func main() {
    htmlstr := `<html>
    <body>
    <a name="0"> text </a>
    <a name="1"> abc </a>
    <a name="2"> def ghi jkl </a>
    <a name="3"> abc </a>
    <a name="4"> Some text </a>
    </body>
    </html>`
    root, err := html.Parse(strings.NewReader(htmlstr))
    if err != nil {
        panic(err)
    }
    node := htmlquery.FindOne(root, "//a[normalize-space(text())='abc']")
    fmt.Println(htmlquery.InnerText(node))
}
zhengchun
  • 1,261
  • 13
  • 19