0

I have an API that works fine. I pull XML data that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
  <buildings>
    <size>
      7
    </size>
      <building>
        <id>
          1
        </id>
          <name>
             First Building
          </name>
      </building>
      <building>
         <id>
           2
         </id>
           <name>
             Second Building
           </name>
      </building>
   </buildings>

Trying to print each building name as

First Building
Second Building

I've tried:

IDS=$(*fullcommand* | awk -F'>|<' '/<name>/ {print $3}')
printf '%s\n' $IDS

But this prints:

First
Building
Second
Building

Any help would be awesome. Thanks

kvantour
  • 25,269
  • 4
  • 47
  • 72
user2720970
  • 284
  • 1
  • 3
  • 17
  • 2
    Don't use awk to parse some XML but have a look at any XML parser such as xmlint or xmlstarlet. They make use of very powerful regex-like expressions which are called XPath. With that, you will be 100% robust. – kvantour Nov 23 '19 at 08:22
  • 1
    Put `$IDS` in quotes. In general, you should always quote variables unless you have a good reason not to. – Barmar Nov 23 '19 at 08:23
  • That simple. Im a noob. – user2720970 Nov 23 '19 at 08:25
  • 1
    Try this robust way: `$ xmlstarlet sel -t -m "//Building" -v name -n file.xml` Equivlanent: [this](https://stackoverflow.com/questions/50484506/). The reason to use an xmlparser is because an update to `fullcommand` might create a differently formatted XML output where the command might fail. – kvantour Nov 23 '19 at 08:27
  • 1
    Does this answer your question? [How to extract multiple patterns between tokens at once with sed?](https://stackoverflow.com/questions/50484506/how-to-extract-multiple-patterns-between-tokens-at-once-with-sed) – kvantour Nov 23 '19 at 08:30

2 Answers2

1
xmlstarlet select --template --match '//name' --value-of 'normalize-space()' -n file

Output:

First Building
Second Building

The normalize-space function strips leading and trailing white-space from a string, replaces sequences of whitespace characters by a single space, and returns the resulting string.

Cyrus
  • 84,225
  • 14
  • 89
  • 153
0

If you want to cut out between "name" and "/name"

sed -n '/<name>/,/<\/name>/p' | grep -v '^ *<' | sed 's/^ *//'
Yuji
  • 525
  • 2
  • 8
  • 2
    [Don't Parse XML/HTML With Regex.](https://stackoverflow.com/a/1732454/3776858) I suggest to use an XML/HTML parser (xmlstarlet, xmllint ...). – Cyrus Nov 23 '19 at 08:38
  • It is better, but it depends on what user2720970 wants. – Yuji Nov 23 '19 at 08:41