1

We have a fairly complex zsh application that uses XML files for storing its configuration and data. The current approach to reading from and writing to those files is using xmlstarlet.

When updating a file we pipe the whole XML multiple times, once for each attribute or element we touch as follows:

cat "$config" \
| xml_addSubnode              "/a/b/c"                 "foo" \
| xml_createOrUpdateAttribute "/a/b/c/foo[last()]"     "attr1"  "zzzz" \
| xml_createOrUpdateAttribute "/a/b/c/foo[last()]"     "attr2"  "wwww" \
\
| xml_addSubnode              "/a/b/c/foo[last()]"     "attr3" \
| xml_createOrUpdateAttribute "/a/b/c/foo[last()]/bar" "attr4"  "zzzz" \
| xml_createOrUpdateAttribute "/a/b/c/foo[last()]/bar" "attr5"  "kkkk" \
\
| xml_update "$config"

The attributes are read in shell variables by calling xml each time separately:

local foo="$(xml_value "$xpath" "$config")"
local bar="$(xml_value "$xpath" "$config")"
...

The utility functions boil down to the following:

xml_addSubnode() {
    ...
    cat | xml ed -s "$elementXPath" -t elem -n "$element"
}

xml_createOrUpdateAttribute()
{
    ...
    cat | xml ed --update ... --insert ...
}

xml_value()
{
    ...
    xml sel -t -v "$xPath" "$xmlFile"
}

xml_update()
{
    ...
    cat > "$file"
}

This code works functionally well, but obviously the performance is horrible.

How can this code be made efficient? What other ways are there to parse XML with zsh or bash that would yield a faster execution?

Using another format is also an option although it would require some migration effort. I know about the jq JSON parser but the usage would be similar to xmlstarlet and I would not gain much if I follow the same approach, right?

The program runs on FreeBSD.

D-FENS
  • 1,438
  • 8
  • 21
  • I just realized my question is related also to [this thread](https://stackoverflow.com/questions/4680143/how-to-parse-xml-using-shellscript). – D-FENS Dec 06 '21 at 11:11

1 Answers1

3

You can do all the updating in a single pass with xmlstarlet, which will be much faster than calling it 6 times:

#!/usr/bin/env zsh

cat test.xml
print -- --------
xmlstarlet ed \
           -s '/a/b/c' -t elem -n foo \
           -s '/a/b/c/foo[last()]' -t attr -n attr1 -v zzzz \
           -s '/a/b/c/foo[last()]' -t attr -n attr2 -v wwww \
           -s '/a/b/c/foo[last()]' -t elem -n bar \
           -s '/a/b/c/foo[last()]/bar' -t attr -n attr3 -v zzzz \
           -s '/a/b/c/foo[last()]/bar' -t attr -n attr4 -v kkkk \
           test.xml

Example:

$ ./test.sh
<?xml version="1.0"?>
<a><b><c/></b></a>
--------
<?xml version="1.0"?>
<a>
  <b>
    <c>
      <foo attr1="zzzz" attr2="wwww">
        <bar attr3="zzzz" attr4="kkkk"/>
      </foo>
    </c>
  </b>
</a>
Shawn
  • 47,241
  • 3
  • 26
  • 60
  • Thank you for the answer, this is indeed a great optimization, which I will try to implement. Do you know any technique, with which I could parse XML in pure zsh or pure bash? Maybe a parsing function implemented in shell only? It's a long shot but our app has like 1000-s of reads and writes to those files and it would be a big time saver. If not possible we could reimplement it in a non-scripted language but the code base is quite large already. I'll wait for a while and if no other solution is presented, I'll accept your answer. – D-FENS Dec 06 '21 at 11:08
  • I accepted your answer, however I would love to see even more optimized solutions if at all possible. – D-FENS Dec 06 '21 at 17:37
  • Probably would need to be entirely rewritten in a more efficient language with an XML parser available. Perl, tcl, python, etc. Maybe even PowerShell if it's been ported to Free? – Shawn Dec 06 '21 at 18:12
  • At the moment we don't have the resource. It's a 40k lines of code piece. I guess, we'll put up with combining the xml calls as you proposed until we have more time to reimplement it. – D-FENS Dec 06 '21 at 19:10
  • A 40k line shell script? Ouch. – Shawn Dec 06 '21 at 19:36
  • It's modular. Ca. 300 include files. It works quite well actually, I would never think that zsh were so scalable. – D-FENS Dec 06 '21 at 20:27