28

How does one check the validity of an XML file to show where the xml error occurs?

Firefox can do it, but I'd like to do it in linux/windows command line.

eg. I've got a large-ish (90MB) XML file from Excel, saved in XML Spreadsheet 2003 format. It contains various invalid data, so that Firefox spits out messages like this:

Line Number 790402, Column 65:
<Cell ss:StyleID="s18"><Data ss:Type="String">Here's some data I&#5;?Bnternational</Data></Cell>

Firefox is quite slow at parsing my XML(presumably because it's keeping it all in memory ready to render into a nice navigable tree). I'm not bothered about validation against an XSD, just want to know if the XML is well-formed.

Itchydon
  • 2,572
  • 6
  • 19
  • 33
Dominic Rodger
  • 97,747
  • 36
  • 197
  • 212
  • possible duplicate of [XML Schema (XSD) validation tool?](http://stackoverflow.com/questions/124865/xml-schema-xsd-validation-tool) – kenorb Apr 09 '15 at 19:20
  • 5
    No it's not - this question explicitly mentions that it's not about validating against an XSD, whereas that question is entirely about validating an XSD. – Dominic Rodger Apr 11 '15 at 19:37

5 Answers5

57

There's a linux command called xmllint that is good for this. It's very fast, handles honking great files without barfing, and gives useful validation error messages.

skaffman
  • 398,947
  • 96
  • 818
  • 769
  • 2
    Cool stuff. Even validation supported... How could I ever live without it? +1 – Boldewyn Jul 17 '09 at 10:19
  • 8
    `xmllint --valid filename.xml` to validate the document in addition to std well-formed check `xmllint --schema name.xsd filename.xml` to validate against a schema file – doub1ejack Jun 28 '18 at 14:07
  • 4
    install: `sudo apt install libxml2-utils` – Elliott Beach Feb 07 '19 at 21:47
  • 3
    I found that without the --noout switch it'll dump out the xml as well, if you want xmllint to show you just where it's failing; xmllint --valid --noout filename.xml – Calvin Taylor Jul 14 '20 at 18:30
19

The other answer one-liner-fied:

python -c "import sys, xml.dom.minidom as d; d.parse(sys.argv[1])" FILE
Gringo Suave
  • 29,931
  • 6
  • 88
  • 75
3

You could use features of other languages for that. E.g., a two-liner in Python:

import xml.dom.minidom as dom
dom.parse ('test.xml')

This will show the problem, and is quite performant. I remember there was an XML toolkit that worked quite well from within bash, but I can't find a link to that right now.

Cheers,

Edit: This question's answer suggested using SAX over dom, since it'd be more performant. A ready-to-use Python script would then look something like this:

#!/usr/bin/env python
import xml.sax as sax
parser = sax.make_parser ()
parser.parse (open ('test.xml'))

Edit 2: I remember again, the tool was XMLStarlet. I found it to be quite nice, when I used it two years ago.

Community
  • 1
  • 1
Boldewyn
  • 81,211
  • 44
  • 156
  • 212
  • Me too, but for really large XML files you'll be happy for every bit of performance you can squeeze from the tool. – Boldewyn Jul 17 '09 at 11:03
2

I always recommend the XML Starlet command line utilities.

They provide validation, querying, formatting, editing of documents straight from the command line, and they're invaluable for this sort of work, or sanity-checking documents, chopping sections out via XPath etc.

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
1

There's yet another new-ish (from 2013) commandline tool that is based on the Xerces parser for doing XML Schema validation. It's called xjparse (https://xjparse.org). So far, I've found this tool to be slow-ish, but one of the more complete schema validators, especially if you happen to have xsd's that include/import other xsd's. It also seems to be available on most of the popular Linux distros.

Edit: This topic/issue recently resurfaced for me again. This time I wrote a fairly simple Ruby script utilizing the Nokogiri gem. The main trick was to first parse the XML for it's XSD namespace documents, then create a "schema" document with them so that could then be used to validate the original XML.

Edit 2: made the script into a gem available on rubygems.org

gem install validate_xml_xsi

Hans
  • 190
  • 1
  • 9