1

i want to count how many times tag1 occurs givin this 123.xml file ( streaming from the internet)

<startend>

 <tag1 name=myname>
<date>10-10-10</date>
</tag1 >

 <tag1 name=yourname>
   <date>11-10-10</date>
  </tag1 >

 </startend>

using : xmlstarlet sel -t -v "count(//tag1)" 123.xml

output :

AttValue: " or ' expected attributes construct error

how to ignore that the attribute has no " " ?

RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • 2
    What you have is not XML. It's a bunch of text with a few angle brackets here and there. There is no way you can feed that to xmlstarlet. There are two options. 1) Fix the producer of this mess, if you can. 2) If you can't, use html tidy in XML mode to repair the input before you give it to xmlstarlet. – Tomalak Nov 14 '17 at 11:38

2 Answers2

3

You input XML/HTML structure has invalid tags/attributes and should be recovered beforehand:

xmlstarlet solution:

xmlstarlet fo -o -R -H -D 123.xml 2>/dev/null | xmlstarlet sel -t -v "count(//tag1)" -n

The output:

2

Details:

  • fo (or format) - Format XML document(s)
  • -o or --omit-decl - omit xml declaration
  • -R or --recover - try to recover what is parsable
  • -D or --dropdtd - remove the DOCTYPE of the input docs
  • -H or --html - input is HTML
  • 2>/dev/null - suppress errors/warnings
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • The first invocation of **xmlstarlet** is an *excellent* solution to repairing XML problems -- added to my canonical [**How to parse bad XML answer**](https://stackoverflow.com/a/44765546/290085) with citation. Very nice! – kjhughes Nov 14 '17 at 12:33
0

XML always requires quotes around attribute values. If you want to keep using XML, you first must produce valid XML from your input. You could use an SGML processor such as OpenSP (in particular, the osx program) to format your input into wellformed XML. It's as simple as invoking osx <your Input file> on it.

If you're on Ubuntu/Debian Linux, you can install osx by invoking sudo apt-get install opensp on the command line (and similarly on other Unix systems).

imhotap
  • 2,275
  • 1
  • 8
  • 16