2

Versions of XMLStarlet found in current Linux distributions have a limit of 128 operations per xmlstarlet ed invocation, and all versions are limited by the operating system's maximum command-line length. How can this be worked around?

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Is this limit a problem for you, in practice? – npostavs Mar 28 '12 at 14:17
  • @npostavs Yes, it is. See my answer to http://stackoverflow.com/questions/9880808/shell-script-to-parse-csv-to-an-xml-query/9882015 for an example of a case where this is needed to process more than a handful of input lines. I've also hit this issue in commercial, production code (though the specific example in mind was later rewritten to do the relevant processing in XQuery rather than bash+xmlstarlet). – Charles Duffy Mar 28 '12 at 15:42

1 Answers1

3

The following breaks long xmlstarlet edit lists into a pipeline of shorter operations:

xmlstarlet_max_commands=100 # max per instance; see http://sourceforge.net/tracker/?func=detail&aid=3488240&group_id=66612&atid=515106
shopt -s extglob # enable +([0-9]) as an equivalent to the regex ^[[:digit:]]+

xmlstarlet_ed() {
  declare -a global_parameters
  declare -a parameters
  declare -i num_commands
  declare -i cmd_len

  global_parameters=( )
  parameters=( )
  num_commands=0

  global_parameters_remaining=$1; shift

  while (( global_parameters_remaining )); do
    global_parameters+=( "$1" ); shift
    (( global_parameters_remaining-- ))
  done

  while (( "$#" )) ; do
    cmd_len=$1; shift
    if ! [[ $cmd_len = +([0-9]) ]] ; then
      echo "ERROR: xmlstarlet_ed commands must be prefixed by run length"
      return 1
    fi

    if (( num_commands < xmlstarlet_max_commands )) ; then
      parameters+=( "${@:1:$cmd_len}" )
      num_commands+=1
      shift $cmd_len
    else
      xmlstarlet ed "${#global_parameters[@]}" "${global_parameters[@]}" "${parameters[@]}" \
        | xmlstarlet_ed "${#global_parameters[@]}" "${global_parameters[@]}" "$cmd_len" "$@"
      return 0
    fi
  done

  if (( ${#parameters[@]} > 0 )) ; then
    xmlstarlet ed "${global_parameters[@]}" "${parameters[@]}"
  else
    cat
  fi
}

It can be invoked as so:

# first list passed is global parameters; first the count, then the values
# pass only a 0 if no global parameters are desired
global_parameters=( 2 -N "xhtml=http://www.w3.org/1999/xhtml" )

# build up the parameter list as length/command pairs; the lengths are used
# to determine the potential split points between subprocesses
parameters=( )
while read; do
  parameters+=( 8 -s /xhtml:html/xhtml:body -t elem -n line -v "$REPLY" )
done

# ...and actually invoke:
xmlstarlet_ed "${global_parameters[@]}" "${parameters[@]}" \
 <<<"<html xmlns='http://www.w3.org/1999/xhtml'><body/></html>"
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • +1 Did not notice the difference between `xmlstarlet_ed` and `xmlstarlet ed` on the first reading. I am feeling that a brief notice saying that `xmlstarlet_ed` is a recursive function would enhance readability somewhat. – Dima Chubarov Aug 05 '14 at 14:13