XML print each line in Bash

Question

I have an API that works fine. I pull XML data that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
  <buildings>
    <size>
      7
    </size>
      <building>
        <id>
          1
        </id>
          <name>
             First Building
          </name>
      </building>
      <building>
         <id>
           2
         </id>
           <name>
             Second Building
           </name>
      </building>
   </buildings>

Trying to print each building name as

First Building
Second Building

I've tried:

IDS=$(*fullcommand* | awk -F'>|<' '/<name>/ {print $3}')
printf '%s\n' $IDS

But this prints:

First
Building
Second
Building

Any help would be awesome. Thanks

Don't use awk to parse some XML but have a look at any XML parser such as xmlint or xmlstarlet. They make use of very powerful regex-like expressions which are called XPath. With that, you will be 100% robust. — kvantour, Nov 23 '19 at 08:22
Put `$IDS` in quotes. In general, you should always quote variables unless you have a good reason not to. — Barmar, Nov 23 '19 at 08:23
Try this robust way: `$ xmlstarlet sel -t -m "//Building" -v name -n file.xml` Equivlanent: [this](https://stackoverflow.com/questions/50484506/). The reason to use an xmlparser is because an update to `fullcommand` might create a differently formatted XML output where the command might fail. — kvantour, Nov 23 '19 at 08:27
Does this answer your question? [How to extract multiple patterns between tokens at once with sed?](https://stackoverflow.com/questions/50484506/how-to-extract-multiple-patterns-between-tokens-at-once-with-sed) — kvantour, Nov 23 '19 at 08:30

score 1 · Answer 1 · answered Nov 23 '19 at 09:29

xmlstarlet select --template --match '//name' --value-of 'normalize-space()' -n file

Output:

First Building
Second Building

The normalize-space function strips leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, and returns the resulting string.

score 0 · Answer 2 · answered Nov 23 '19 at 08:34

0

If you want to cut out between "name" and "/name"

sed -n '/<name>/,/<\/name>/p' | grep -v '^ *<' | sed 's/^ *//'

answered Nov 23 '19 at 08:34

Yuji

525
2
8

2

[Don't Parse XML/HTML With Regex.](https://stackoverflow.com/a/1732454/3776858) I suggest to use an XML/HTML parser (xmlstarlet, xmllint ...). – Cyrus Nov 23 '19 at 08:38
It is better, but it depends on what user2720970 wants. – Yuji Nov 23 '19 at 08:41

XML print each line in Bash

2 Answers2