1

http://doi.cnki.net/Resolution/Handler?doi=10.13345/j.cjb.180087

When I run xmlstarlet sel -t -v '//i[@class = "iconSucc"]/@class' on the following file, I got the following error messages. Does anybody know how to fix the problem?

-:77.294: Opening and ending tag mismatch: img line 77 and a
 middle;" src="/Content/images/gongshangbiaoshi.gif" alt="" class="footpic"></a>
                                                                               ^
-:77.301: Opening and ending tag mismatch: a line 77 and span
;" src="/Content/images/gongshangbiaoshi.gif" alt="" class="footpic"></a></span>
                                                                               ^
-:77.332: Opening and ending tag mismatch: span line 77 and p
i.gif" alt="" class="footpic"></a></span><br />©2014-2018中国知网(CNKI) </p
                                                                               ^
-:79.15: Opening and ending tag mismatch: p line 74 and div
        </div>
              ^
-:92.8: Opening and ending tag mismatch: div line 61 and body
</body>
       ^
-:93.8: Opening and ending tag mismatch: body line 11 and html
</html>
       ^
-:94.1: Premature end of data in tag html line 2

^
user1424739
  • 11,937
  • 17
  • 63
  • 152

1 Answers1

2

Your HTML file is not well-formed. The <img> element at line 77

<img style="height: 24px; border: 0px none; vertical-align: middle;" src="/Content/images/gongshangbiaoshi.gif" alt="" class="footpic" >

is not closed. Add a closing tag ... /> to make it well-formed:

<img style="height: 24px; border: 0px none; vertical-align: middle;" src="/Content/images/gongshangbiaoshi.gif" alt="" class="footpic" />

Then the output will be:

iconSucc

EDIT:

Using xmllint, you can achieve the result with one command:

xmllint -html -xmlout Handler.xml | xmlstarlet sel -t -v '//i[@class = "iconSucc"]/@class'
zx485
  • 28,498
  • 28
  • 50
  • 59
  • Is there a way to automatically make the input HTML well-formed? lxml does not have this problem if the input is not well-formed. – user1424739 Nov 01 '19 at 22:25
  • Google for "Converting HTML to well formed XML". Or, using StackOverflow suggestions, have a look at this question: [Converting HTML to well formed XML](https://stackoverflow.com/q/10473875/1305969). `tidy` seems to be a hot candidate. – zx485 Nov 01 '19 at 22:30
  • I added a one-line solution using `xmllint`. – zx485 Nov 01 '19 at 22:35