Is anyone familiar with OOXML schemas? I need to build a schema to validate the style of a specific documents based on certain rules and I can't find a certain reference or even example to start with.
-
1"validate the style" - you mean something like "Heading 1" must appear before "Heading 2"? What programming language do you prefer? – JasonPlutext Dec 03 '18 at 04:51
-
Yeah exactly, and till now I didn't find the right schema for style. I don't mind but java isb good – HHSE123 Dec 04 '18 at 05:13
-
1I think you'll need to roll your own. For example: extract all the paragraph styles from the docx, then test whether each transition matches your rules. You can use docx4j (or poi, i expect) for this. – JasonPlutext Dec 04 '18 at 05:56
-
yeah I heard about this library (docx4j) I think I will start from this point. thank you for your help – HHSE123 Dec 04 '18 at 17:03
2 Answers
The reference for OOXML is Standard ECMA-376 Office Open XML File Formats. Be forewarned that the specification is complex; expect to have to do a lot of reading/learning about both the OOXML specification and XML technologies and techniques prior to making any progress.
See also:

- 106,133
- 27
- 181
- 240
-
-
1The vocabulary and grammar of *style-related* aspects of OOXML are included in the XSDs. Do realize, however, that as is often true, XSDs represent only a portion of the specification of a standard. If you wish to check everything style-related in a DOCX file wrt the specification, you'll have to write additional code; XSD validation alone will not suffice. – kjhughes Dec 04 '18 at 13:21
-
thank you @kjhughes that was helpful. I think I should start with docx4j library – HHSE123 Dec 04 '18 at 17:02
-
If you like Java, yes, docx4j is good and will be very helpful. If you like XPath, consider Schematron or XSLT to perform your checks. XSLT has the added advantage of providing powerful transformational capabilities if you also have to convert the input or produce reports against it. – kjhughes Dec 04 '18 at 17:19
Word documents are usually just a flat sequence of paragraphs and tables (table cells contain paragraphs), though you can use/nest content controls to group paragraphs, and there are other more exotic objects such as altChunks.
Other things you might be interested in:
sectPr (controlling headers/footers, page size/orientation etc), since this is indicative of a new part/chapter
outline level?
The paragraph style is just a setting on a paragraph. Given this structure, schematron might not be quite so useful as it is against say docbook or TEI.
But transforming your main document part (word/document.xml) to something simpler via XSLT is potentially a good approach.
It all depends what your constraints look like.

- 15,352
- 4
- 44
- 84