4

I need to extract the name value (Product Finder) from this xml:

File: config.xml

<?xml version="1.0" encoding="utf-8"?>
<widget id="com.abc.app" version="1.3.1" xmlns="http://www.w3.org/ns/widgets" xmlns:android="http://schemas.android.com/apk/res/android" xmlns:cdv="http://cordova.apache.org/ns/1.0" ios-CFBundleVersion="1.3.1.5" android-versionCode="5">
    <name>Product Finder</name>
    <description>
        Description
    </description>
</widget>

I've tried:

mles$ cat config.xml | grep '<name>'
    <name>Product Finder</name>

Some other answers suggest using grep -oPm1 "(?<=<xmltag>)[^<]+", but that yields an error:

mles$ cat config.xml | grep -oPm1 "(?<=<name>)[^<]+"
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
[-e pattern] [-f file] [--binary-files=value] [--color=when]
[--context[=num]] [--directories=action] [--label] [--line-buffered]
[--null] [pattern] [file ...]

How can I get the name value? I need a solution without dependencies, so grep would be preferred

mles
  • 4,534
  • 10
  • 54
  • 94

5 Answers5

4

grep only finds the line, you have to use an additional tool to extract the name, like sed (not an additional dependency):

grep '<name>' config.xml | sed "s@.*<name>\(.*\)</name>.*@\1@"

What sed does here is takes everything between <name></name> and substitutes the whole line with the found text between the tags

Rogus
  • 750
  • 5
  • 11
  • 3
    Why would you people recommend non XML aware tools when you can use `xmllint` or `xmlstarlet` – Inian Apr 28 '17 at 11:08
  • How would you remove the remaining whitespace? – mles Apr 28 '17 at 12:48
  • @mles just added `.*` before and after the tags to remove whitespace tabs etc. – Rogus Apr 28 '17 at 15:19
  • 1
    @Inian I wasn't actually aware that there are XML parsers available without installing additional packages. And since OP wanted no dependencies and used `grep` in his question I figured that `sed` is all OP needs in this case. Thanks for commenting, reading `man xmllint` right now :) – Rogus Apr 28 '17 at 15:50
  • I'll accept this answer as I needed to finish my task yesterday and this worked. However I'm still interested in using xmllint for this - see: http://stackoverflow.com/questions/43694722/extract-value-from-xml-file-with-namespaces-by-using-xmllint-in-bash – mles Apr 29 '17 at 10:53
3

Your XML isn't syntactically right. The W3School XML validitor page says so,

error on line 8 column 1. Extra content at the end of the document

Because the header line <?xml version="1.0" encoding="utf-8"?> is a processing instruction that identifies the document as being XML. All XML documents should begin with an XML declaration.

Also, xmllint should be built-into native Mac OS X bash by default in which you can just do

xmllint --xpath "/widget/name/text()" xml
Product Finder

The right formatting for your XML should have been

<?xml version="1.0" encoding="UTF-8"?>
<widget id="123" version="1.3.1">
   <name>Product Finder</name>
   <description>Description</description>
</widget>
Inian
  • 80,270
  • 14
  • 142
  • 161
  • You're right, I've edited the sample xml code. So xmllint was working fine, but I had omitted some of the attributes in the `` tag. With the attributes of the production config.xml file I get `XPath set is empty`. I guess this is a problem with the namespace? – mles Apr 28 '17 at 11:23
  • @mles: you need share a snippet of the non-working `XML` for me to have a look at, – Inian Apr 28 '17 at 11:28
  • I have accept the solution with `grep` as I needed to finish my task yesterday and it worked. However I'm still interested in using xmllint for this. I've opened another question: http://stackoverflow.com/questions/43694722/extract-value-from-xml-file-with-namespaces-by-using-xmllint-in-bash – mles Apr 29 '17 at 10:54
1

You should use a xml parser, like xmllint for example.
Your xml is invalid and you should fix it, if you can't, use the following regex:

perl -n -e'/<name>(.*)<\/name>/ && print $1' file.xml
# Product Finder

Options:

-n                assume "while (<>) { ... }" loop around program
-e program        one line of program (several -e's allowed, omit programfile)
Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268
1

The following bash built-in will do the job but it's not an xml parser

while IFS=\> read -d\< -r tag value || [[ -n $tag ]]; do
    if [[ $tag == name ]]; then
        echo "$value";
        break;
    fi;
done < config.xml
Nahuel Fouilleul
  • 18,726
  • 2
  • 31
  • 36
0

You can do it using the multiple delimiter feature of awk:

awk -F'[<>]' '/name.*name/{print $3}' config.xml
Tomerikoo
  • 18,379
  • 16
  • 47
  • 61
Joe Camel
  • 49
  • 3