1

I have a HTML file. Some elements in this file are marked with special attributes: Say level0="level0 name", level1="level 1name", level2="name".

how to check if this attributes have desired structure ?

a. levels must be nested as their index

b. the names of the levels on same "level" should be distinct

c. level0 must have at least one element with level1

d. one HTML Element may have only one level attribute

Update 1: c. Html element with attribute "level0" must have at least one descending html element with attribute "level1"

Update 2: It is very important, that error messages are understandable und simple.

For parsing HTML I'm using JSoup but I'm open so far. I can imagine to use XSD-Schema or XPath. Or some combinations of it in Java. I wish show simple and reasonable error messages to the user.

<body>
<div level0="lvl0-0">
  <div>
   ...
  <span level1="lvl1-0"> 
    <p level2="lvl2-0"> text goes here </p>
    <p level2="lvl2-1"> textY goes here </p>
  </span>
  <span level1="lvl1-1"> 
    <p level2="lvl2-0"> text goes here </p>
  </span>
   ...
  <div>
<div>

<div class="bla">    
 <div level0="lvl0-1">
   <span level1="lvl1-0"> 
     <p level2="lvl2-0"> text goes here </p>
   </span>
 <div>
</div>
</body>
Tony
  • 2,266
  • 4
  • 33
  • 54
  • Should not be a problem to realize this with jsoup. I would suggest to design test cases first, so you can validate the behavior of your parser/validator (sort of test driven development). What problems are you facing with jsoup? – Frederic Klein Sep 01 '16 at 13:53

1 Answers1

1

You should create an XSD and then use something like Xerces to validate the structure.

See What's the best way to validate an XML file against an XSD file? for a good example.

Community
  • 1
  • 1
Zack
  • 3,819
  • 3
  • 27
  • 48
  • Thank you for your answer. Does XSD generate simple error messages ? – Tony Aug 31 '16 at 15:53
  • As far as I know you cannot put restrictions such as ``c. level0 must have at least one element with level1`` (he's talking about attributes, not xml elements) into an xsd. – f1sh Aug 31 '16 at 15:53
  • XSD doesn't produce any error messages. It is the element definition. The tool (Xerces, e.g.) you use to validate your XML content against the XSD would produce an error. – Zack Sep 07 '16 at 13:24