3

Suppose I have an xml file:

<?xml version='1.0' encoding='utf-8' standalone='yes' ?>
<map>
    <string name="a"></string>
</map>

And I want to set the value of string with attribute a with something big:

$ xmlstarlet ed -u '/map/string[@name="a"]' -v $(for ((i=0;i<200000;i++)); do echo -n a; done) example.xml > o.xml

This will result in bash error "Argument list is too long". I was unable to find option in xmlstarlet which accept result from a file. So, how would I set the value of xml tag with 200KB data+?

Solution

After trying to feed chunks into the xmlstarlet by argument -a (append), I realized that I am having additional difficulties like escape of special characters and the order in which xmlstarlet accepts these chunks. Eventually I reverted to simpler tools like xml2/sed/2xml. I am dropping the code as a separate post below.

reardenlife
  • 317
  • 4
  • 15
  • If you want a hack, set it to some string which you are sure that does not exist in the xml (e.g. `THIS_STRING_DOES_NOT_EXIST`) and then replace `THIS_STRING_DOES_NOT_EXIST` with your intended string using sed or similar tools.` – anishsane Aug 05 '19 at 05:40
  • @anishsane I am just trying to find a legit way to edit xml. – reardenlife Aug 05 '19 at 12:30
  • 1
    You could use the -a option of xmlstarlet (append) instead of -v, cut down your command say to 20000 instead of 200000, and loop 10 times over that. So you would append 20000 each loop. –  Aug 05 '19 at 12:53
  • @Roadowl interesting proposal. I would not call it a legit way though. :) – reardenlife Aug 05 '19 at 13:07
  • @Roadowl any suggestions how I can do that in bash? I tried to use a combination of xargs, read, echo and cut .... unsuccessfully. Lets assume that we are working with real world data, not just a string of 'a' – reardenlife Aug 09 '19 at 03:42
  • @Roadowl what do you mean by "instead of -v"? It seems that -a must be used in conjunction with it. Can you provide POC of your proposal? – reardenlife Aug 09 '19 at 14:42
  • Heh. I asked this same question and answered it myself some years back... – Charles Duffy Aug 09 '19 at 15:35
  • @reardenlife "Lets assume that we are working with real world data" effectively makes this a new question, with new conditions and new criteria. No fair! –  Aug 09 '19 at 15:37
  • ...that said, 200KB is simply *too long* for many operating systems -- there's an OS-enforced limit on the amount of data that can be in (combined) environment variable and command-line space. – Charles Duffy Aug 09 '19 at 15:42
  • @reardenlife, ...which is to say that you may need to reach for different tools, to insert a single huge element rather than a large number of smaller ones. Might I suggest the excellent XML libraries Python ships with? – Charles Duffy Aug 09 '19 at 15:43
  • @reardenlife, ...to be clear: This isn't a bash error; it's an operating-system error; the `execve()` syscall fails when it's passed content that can't be fit in the relevant memory region, no matter what the invoking language is. – Charles Duffy Aug 09 '19 at 15:47
  • For anyone trying to add *many* new elements, as opposed to *a single long* element, see the near-duplicate [handling long edit lists in xmlstarlet](https://stackoverflow.com/questions/9898939/handling-long-edit-lists-in-xmlstarlet). – Charles Duffy Aug 09 '19 at 15:49
  • I ended up using xml2/2xml and sed. :) – reardenlife Aug 09 '19 at 16:27
  • @reardenlife, ...why not add your own answer showing how you did that? It'd likely be useful to other readers. – Charles Duffy Aug 09 '19 at 21:00
  • @Charles Duffy. Done. But I doubt it would be useful for anyone. :) – reardenlife Aug 10 '19 at 02:35
  • @Cyrus but there is simply no decent solution to be accepted. So what's the point of voting or accepting then? – reardenlife Oct 12 '19 at 23:03

3 Answers3

0

This, as a workaround for your own example that bombs because of the ARG_MAX limit:

#!/bin/bash
# (remove 'echo' commands and quotes around '>' characters when it looks good)

echo xmlstarlet ed -u '/map/string[@name="a"]' -v '' example.xml '>' o.xml

for ((i = 0; i < 100; i++))
do
    echo xmlstarlet ed -u '/map/string[@name="a"]' -a -v $(for ((i=0;i<2000;i++)); do echo -n a; done) example.xml '>>' o.xml
done
0

SOLUTION

I am not proud of it, but at least it works.

a.xml - what was proposed as an example in the starting post
source.txt - what has to be inserted into a.xml as xml tag
b.xml - output
#!/usr/bin/env bash
ixml="a.xml"
oxml="b.xml"
s="source.txt"
echo "$ixml --> $oxml"

t="$ixml.xml2"
t2="$ixml.xml2.edited"
t3="$ixml.2xml"

# Convert xml into simple string representation
cat "$ixml" | xml2 > "$t"

# Get the string number of xml tag of interest, increment it by one and delete everything after it
# For this to work, the tag of interest should be at the very end of xml file
cat "$t" | grep -n -E 'string.*name=.*a' | cut -f1 -d: | xargs -I{} echo "{}+1" | bc | xargs -I{} sed '{},$d' "$t" > "$t2"
# Rebuild the deleted end of the xml2-file with the escaped content of s-file and convert everything back to xml
# * The apostrophe escape is necessary for apk xml files
sed "s:':\\\':g" "$s" | sed -e 's:^:/map/string=:' >> "$t2"
cat "$t2" | 2xml > "$t3"
# Make xml more readable
xmllint --pretty 1 --encode utf-8 "$t3" > "$oxml"

# Delete temporary files
rm -f "$t"
rm -f "$t2"
rm -f "$t3"
reardenlife
  • 317
  • 4
  • 15
0

It's a question of merging XML and XML or XML and text. This can be done by having xmlstarlet's transform command perform XInclude processing. Merging XML and XML can optionally be done with its select and edit commands (combine-extract method).

These 2 data files are used in the following:

  • file1.xml - the main file to which stuff is added: <map><string name="a"></string></map>
  • file2.xml - the part file from which stuff is copied: <doc><g><g1/><g2/><g3/><g4/></g></doc>

First, the XInclude method:

# shellcheck  shell=sh  disable=SC2016,SC2064
mainfile='file1.xml'
partfile='file2.xml'
mainxpath='/map/string[@name="a"]'
partxpath='/doc/g/*'

mainftmp="$(mktemp)"
partftmp="$(mktemp)"
trap "rm -f -- '${mainftmp}' '${partftmp}'" INT EXIT
cp -- "${partfile}" "${partftmp}"

xmlstarlet edit \
  -s "${mainxpath}" -t 'elem' -n 'xi:include' \
  --var V '$xstar:prev' \
  -s '$V' -t 'attr' -n 'xmlns:xi' -v 'http://www.w3.org/2001/XInclude' \
  -s '$V' -t 'attr' -n 'href' -v "${partftmp}" \
  -s '$V' -t 'attr' -n 'xpointer' -v "xpointer(${partxpath})" \
"${mainfile}" > "${mainftmp}"
xmlstarlet select -C -t -c / |
xmlstarlet transform --xinclude /dev/stdin "${mainftmp}"

where:

  • the mainxpath shell variable holds the XPath expression which points within the main file, i.e. the destination XML element to add stuff to, and partxpath specifies the nodes to extract from the part file
  • mktemp creates absolute pathnames for temporary files, trap deletes them after use
  • xmlstarlet edit is invoked to modify the main file:
    • the 4 -s (aka --subnode) add an xi:include element to the destination element:
      <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="/path/to/partftmp" xpointer="xpointer(/doc/g/*)"/>
    • the XPointer expression specifies the XPath of the nodes to include from the part file, it's possible to use complex expressions here such as unions
    • --var defines a named variable, and the back reference prev (aka xstar:prev) variable refers to the node(s) created by the most recent -s, -i, or -a option which all define or redefine it (see xmlstarlet.txt for examples of --var and $prev)
  • xi:include elements may appear in both the main file and XML part file(s)
  • xmlstarlet transform --xinclude does the XInclude processing using an XSLT stylesheet (generated on the fly by xmlstarlet select) which duplicates its input by copying the root node /

Output:

<map>
  <string name="a">
    <g1/><g2/><g3/><g4/>
  </string>
</map>

Merging XML and text: if the 4th -s action (xpointer="…") in the edit command above is replaced with -s '$V' -t 'attr' -n 'parse' -v 'text' the entire part file is parsed as text and the special XML characters automatically escaped, generating the following output:

<map>
  <string name="a">
    &lt;doc&gt;&lt;g&gt;&lt;g1/&gt;&lt;g2/&gt;&lt;g3/&gt;&lt;g4/&gt;&lt;/g&gt;&lt;/doc&gt;
  </string>
</map>

Second, the combine-extract method:

# shellcheck  shell=sh  disable=SC2016
mainfile='file1.xml'
partfile='file2.xml'
mainxpath='/map/string[@name="a"]'
partxpath='/doc/g/*'

xmlstarlet select -R -t \
  --var part -o "${partfile}" -b \
  -c ' / | document($part)' "${mainfile}" |
xmlstarlet edit -m '/xsl-select'"${partxpath}" '/xsl-select'"${mainxpath:-/..}" |
xmlstarlet select -B -I -t -c '/xsl-select/*[1]'
  • invoke select to copy the 2 documents and wrap them (-R) as /xsl-select/*[1] and /xsl-select/*[2], using the XSLT document function to access the part file – either the main file or the part file can be /dev/stdin
  • call edit to move grandchildren of ${partfile}’s root element to ${mainxpath} – incoming nodes will be appended as last nodes there
  • the default ${mainxpath} value (/..) causes an error to be generated and must be overridden
  • invoke select to extract and format the merged document

Output:

<map>
  <string name="a">
    <g1/>
    <g2/>
    <g3/>
    <g4/>
  </string>
</map>

Lastly, if 200000 "a"s are in fact required the EXSLT str:padding function is useful for character repetition:

xmlstarlet edit \
  --var T 'str:padding(100000,"a")' \
  -u 'map/string[@name="a"]' -x 'concat($T,$T)' \
file1.xml

Note that libexslt (not EXSLT) limits the length of str:padding output to 100000 (one hundred thousand).

urznow
  • 1,576
  • 1
  • 4
  • 13