It's a question of merging XML and XML or XML and text. This can be
done by having xmlstarlet
's transform
command
perform XInclude processing.
Merging XML and XML can optionally be done with its select
and edit
commands (combine-extract method).
These 2 data files are used in the following:
file1.xml
- the main file to which stuff is added:
<map><string name="a"></string></map>
file2.xml
- the part file from which stuff is copied:
<doc><g><g1/><g2/><g3/><g4/></g></doc>
First, the XInclude method:
# shellcheck shell=sh disable=SC2016,SC2064
mainfile='file1.xml'
partfile='file2.xml'
mainxpath='/map/string[@name="a"]'
partxpath='/doc/g/*'
mainftmp="$(mktemp)"
partftmp="$(mktemp)"
trap "rm -f -- '${mainftmp}' '${partftmp}'" INT EXIT
cp -- "${partfile}" "${partftmp}"
xmlstarlet edit \
-s "${mainxpath}" -t 'elem' -n 'xi:include' \
--var V '$xstar:prev' \
-s '$V' -t 'attr' -n 'xmlns:xi' -v 'http://www.w3.org/2001/XInclude' \
-s '$V' -t 'attr' -n 'href' -v "${partftmp}" \
-s '$V' -t 'attr' -n 'xpointer' -v "xpointer(${partxpath})" \
"${mainfile}" > "${mainftmp}"
xmlstarlet select -C -t -c / |
xmlstarlet transform --xinclude /dev/stdin "${mainftmp}"
where:
- the
mainxpath
shell variable holds the XPath expression which
points within the main file, i.e. the destination XML element to
add stuff to, and partxpath
specifies the nodes to extract from
the part file
mktemp
creates absolute pathnames for temporary files,
trap
deletes them after use
xmlstarlet edit
is invoked to modify the main file:
- the 4
-s
(aka --subnode
) add an xi:include
element to the destination element:
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="/path/to/partftmp" xpointer="xpointer(/doc/g/*)"/>
- the XPointer expression specifies the XPath of the nodes to include
from the part file, it's possible to use complex expressions here
such as unions
--var
defines a named variable, and the back reference prev
(aka
xstar:prev
) variable refers to the node(s) created by the most
recent -s
, -i
, or -a
option which all define or redefine it
(see xmlstarlet.txt
for examples of
--var
and $prev
)
xi:include
elements may appear in both the main file and XML part file(s)
xmlstarlet transform --xinclude
does the XInclude processing using
an XSLT stylesheet (generated on the fly by xmlstarlet select
)
which duplicates its input by copying the root node /
Output:
<map>
<string name="a">
<g1/><g2/><g3/><g4/>
</string>
</map>
Merging XML and text: if the 4th -s
action (xpointer="…"
) in the
edit
command above is replaced with
-s '$V' -t 'attr' -n 'parse' -v 'text'
the entire part file is parsed as text and the special XML
characters automatically escaped, generating the following output:
<map>
<string name="a">
<doc><g><g1/><g2/><g3/><g4/></g></doc>
</string>
</map>
Second, the combine-extract method:
# shellcheck shell=sh disable=SC2016
mainfile='file1.xml'
partfile='file2.xml'
mainxpath='/map/string[@name="a"]'
partxpath='/doc/g/*'
xmlstarlet select -R -t \
--var part -o "${partfile}" -b \
-c ' / | document($part)' "${mainfile}" |
xmlstarlet edit -m '/xsl-select'"${partxpath}" '/xsl-select'"${mainxpath:-/..}" |
xmlstarlet select -B -I -t -c '/xsl-select/*[1]'
- invoke
select
to copy the 2 documents and wrap them (-R
) as
/xsl-select/*[1]
and /xsl-select/*[2]
, using the XSLT
document
function to access the part file – either the main file or the part
file can be /dev/stdin
- call
edit
to move grandchildren of ${partfile}
’s root element to
${mainxpath}
– incoming nodes will be appended as last nodes there
- the default
${mainxpath}
value (/..
) causes an error to be
generated and must be overridden
- invoke
select
to extract and format the merged document
Output:
<map>
<string name="a">
<g1/>
<g2/>
<g3/>
<g4/>
</string>
</map>
Lastly, if 200000 "a"s are in fact required the EXSLT
str:padding
function is useful for character
repetition:
xmlstarlet edit \
--var T 'str:padding(100000,"a")' \
-u 'map/string[@name="a"]' -x 'concat($T,$T)' \
file1.xml
Note that
libexslt
(not EXSLT) limits the length of str:padding
output to 100000 (one
hundred thousand).