Versions of XMLStarlet found in current Linux distributions have a limit of 128 operations per xmlstarlet ed
invocation, and all versions are limited by the operating system's maximum command-line length. How can this be worked around?
Asked
Active
Viewed 1,016 times
2

Charles Duffy
- 280,126
- 43
- 390
- 441
-
Is this limit a problem for you, in practice? – npostavs Mar 28 '12 at 14:17
-
@npostavs Yes, it is. See my answer to http://stackoverflow.com/questions/9880808/shell-script-to-parse-csv-to-an-xml-query/9882015 for an example of a case where this is needed to process more than a handful of input lines. I've also hit this issue in commercial, production code (though the specific example in mind was later rewritten to do the relevant processing in XQuery rather than bash+xmlstarlet). – Charles Duffy Mar 28 '12 at 15:42
1 Answers
3
The following breaks long xmlstarlet edit lists into a pipeline of shorter operations:
xmlstarlet_max_commands=100 # max per instance; see http://sourceforge.net/tracker/?func=detail&aid=3488240&group_id=66612&atid=515106
shopt -s extglob # enable +([0-9]) as an equivalent to the regex ^[[:digit:]]+
xmlstarlet_ed() {
declare -a global_parameters
declare -a parameters
declare -i num_commands
declare -i cmd_len
global_parameters=( )
parameters=( )
num_commands=0
global_parameters_remaining=$1; shift
while (( global_parameters_remaining )); do
global_parameters+=( "$1" ); shift
(( global_parameters_remaining-- ))
done
while (( "$#" )) ; do
cmd_len=$1; shift
if ! [[ $cmd_len = +([0-9]) ]] ; then
echo "ERROR: xmlstarlet_ed commands must be prefixed by run length"
return 1
fi
if (( num_commands < xmlstarlet_max_commands )) ; then
parameters+=( "${@:1:$cmd_len}" )
num_commands+=1
shift $cmd_len
else
xmlstarlet ed "${#global_parameters[@]}" "${global_parameters[@]}" "${parameters[@]}" \
| xmlstarlet_ed "${#global_parameters[@]}" "${global_parameters[@]}" "$cmd_len" "$@"
return 0
fi
done
if (( ${#parameters[@]} > 0 )) ; then
xmlstarlet ed "${global_parameters[@]}" "${parameters[@]}"
else
cat
fi
}
It can be invoked as so:
# first list passed is global parameters; first the count, then the values
# pass only a 0 if no global parameters are desired
global_parameters=( 2 -N "xhtml=http://www.w3.org/1999/xhtml" )
# build up the parameter list as length/command pairs; the lengths are used
# to determine the potential split points between subprocesses
parameters=( )
while read; do
parameters+=( 8 -s /xhtml:html/xhtml:body -t elem -n line -v "$REPLY" )
done
# ...and actually invoke:
xmlstarlet_ed "${global_parameters[@]}" "${parameters[@]}" \
<<<"<html xmlns='http://www.w3.org/1999/xhtml'><body/></html>"

Charles Duffy
- 280,126
- 43
- 390
- 441
-
+1 Did not notice the difference between `xmlstarlet_ed` and `xmlstarlet ed` on the first reading. I am feeling that a brief notice saying that `xmlstarlet_ed` is a recursive function would enhance readability somewhat. – Dima Chubarov Aug 05 '14 at 14:13