I am trying to parse a html document with the Golang xml parser. I have managed it to extract all the <li>
elements but if the element contains a link <a>
, then the content of the link is ignored. I would like to just ignore the nested <a>
and display it's content as plain text but I don't know how.
Here is my code:
d := xml.NewDecoder(resp.Body)
d.Strict = false
d.AutoClose = xml.HTMLAutoClose
d.Entity = xml.HTMLEntity
type list_item struct {
Data string `xml:",chardata"`
}
for {
t,_ := d.Token()
if t == nil {
break
}
switch se := t.(type) {
case xml.StartElement:
if se.Name.Local == "li" {
var q list_item
d.DecodeElement(&q, &se)
c.Infof("%+v\n", q)
}
}
}
Is there any way to just ignore nested elements and display their content?