0

Background:

(Not too relevant, so you can skip the background if you'd like.)
We have a Word docx template document (which may or may not contain a numbered/bullet-list), containing some tags we replace. We already have code to replace our tags with data. This data however, can also contain certain elements including numbered/bullet-lists (converted from HTML to docx format). All this is working well, except for one issue: a docx-zip needs to contain a word/numbering.xml with a definition for all its lists. I'm currently in the process of merging an existing numbering.xml from the Word docx template, with another numbering.xml from our HTML to docx converted values. I've already incremented the numIds and abstractNumIds in the numbering.xml I want to merge, and now want to actually merge the elements.

What I'm currently trying to accomplish:

So I'm having two numbering.xml file contents (as String) that I'm now trying to merge in . Hugely simplified, I have these two files:

<?xml version="1.0" encoding="UTF-8"?>
<parent>
  <A id="1">...</A>
  <A id="2">...</A>
  <B id="1">...</B>
  <B id="2">...</B>
</parent>

And:

<?xml version="1.0" encoding="UTF-8"?>
<parent>
  <A id="3">...</A>
  <B id="3">...</B>
</parent>

And I want it merge to:

<?xml version="1.0" encoding="UTF-8"?>
<parent>
  <A id="1">...</A>
  <A id="2">...</A>
  <A id="3">...</A>
  <B id="1">...</B>
  <B id="2">...</B>
  <B id="3">...</B>
</parent>

So basically have all <A> elements grouped together and all <B> elements grouped together, while retaining order.

Current problem:

Googling a bit, I could use the Java code provided in this stackoverflow question body to merge the <parent> elements. But that would result in A1,A2,B1,B2,A3,B3 instead of A1,A2,A3,B1,B2,B3. Unfortunately, the Word docx doesn't work in that case (I've tested this), since order in (OO)XML is important.

So, does anyone know how to merge the elements of two XML files, grouping child-elements together as mentioned above?

Kevin Cruijssen
  • 9,153
  • 9
  • 61
  • 135
  • 2
    How about unmarshalling to A and B lists, sorting by id, merging to a LinkedHashSet and marshall it? – LMC Oct 05 '22 at 15:12

1 Answers1

1

Looking at this from an XSLT perspective (as I invariably do), in your example it's simply a case of concatenating the two sequences, and then sorting by a two-part key comprising (a) the element name, and (b) the value of the id attribute. In XSLT 3.0, you can optimize this a little with xsl:merge.

But this assumes the element names are sorted (A, B, ...) as in your example, and I recognise this might not actually be the case. But that then raises questions about exactly what the input sequences might be if they are not exactly as in your example. Can we rely on the fact that partitioning each sequence by element name will produce exactly the same sequence of element names for each of the two sequences? If that's the case then you could use xsl:for-each-group group-adjacent="node-name()" to partition each of the input sequences, and then concatenate and sort the corresponding groups.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • That does sound promising. And the `A`/`B` and `id` attribute are indeed a simplification of course. In reality, the `word/numbering.xml` files within a .docx zip file are `A` + `A.id` = `` and `B` + `B.id` = ``. In addition, both `A` and `B` will have child elements and perhaps other attributes besides `w:abstractNumId` that should remain unchanged within. – Kevin Cruijssen Oct 06 '22 at 06:42