3

I have a series of xml files that looks something like this:

<ROOT>
    <F P=100> Some text here </F>
    <F P=101> More text </F>
    ...
</ROOT>

I'm trying to parse the xml using the standard DOM way, but because the attribute values for P are not in quotes, Java complains.

I tried using JTidy to clean it up, but because my xml isn't HTML, Tidy throws errors complaining that it doesn't recognize the tags <ROOT> and <F> etc.

So, is there another way to do this? Alternatively, I guess I could use regex since the only attributes without quotes occur in the <F> tags. Any thoughts on either?

Thanks in advance

neptune
  • 1,380
  • 2
  • 17
  • 25
  • There has been a similar [question](http://stackoverflow.com/questions/5964200/how-can-i-make-my-xml-safe-for-parsing-when-it-has-character-in-it) a few days ago. Your case is perhaps easy to handle with regex iff you have few tags and attributes to work with. Otherwise, I think the real solution is go back to the provider of those files and ask them to build valid XML. – MarcoS May 13 '11 at 07:26
  • 2
    Nvm, fixed it. All I had to do was set `tidy.setXmlTags(true)` so that tidy treats the input as XML and not HTML – neptune May 13 '11 at 07:33
  • 1
    Please don't call them XML files when they aren't, it's very confusing. Call them non-XML files, and then we'll all know what you are talking about. It also make it clear that you need non-XML tools to process them. – Michael Kay May 13 '11 at 14:29

1 Answers1

3

All I had to do was set tidy.setXmlTags(true) so that tidy treats the input as XML and not HTML

– sheldon

Qtax
  • 33,241
  • 9
  • 83
  • 121
  • I know this comes very late. But do you happen to have the piece of jtidy code you use to tidy the XML. I have a very similar problem & jtidy does not work for me. Thanks so much – Harry Oct 22 '12 at 18:25
  • @Harish, this is a direct quote from the comments on the question above. You can try asking the poster there. – Qtax Oct 23 '12 at 21:04