2

I'd like to know a command to extract just the value from line 8 of this file, minus the <string> and </string>, in other words output only 3.2.2

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>BuildVersion</key>
    <string>8</string>
    <key>CFBundleShortVersionString</key>
    <string>3.2.2</string>
    <key>CFBundleVersion</key>
    <string>399.12</string>
    <key>ProjectName</key>
    <string>ServerApp</string>
    <key>SourceVersion</key>
    <string>399012000000000</string>
</dict>
</plist>

Your suggestions are much appreciated! Thanks, Dan

Zombo
  • 1
  • 62
  • 391
  • 407
Dan
  • 931
  • 2
  • 18
  • 31

4 Answers4

9

As stated by Steven Penny and the link RegEx match open tags except XHTML self-contained tags, to parse XML, a proper xml parser is required, one of them is

$ xmllint --xpath '/plist/dict/string[2]/text()' file.xml

or with :

$ xmlstarlet sel -t -v '/plist/dict/string[2]/text()' file.xml

or with saxon-lint :

$ saxon-lint --xpath '/plist/dict/string[2]/text()' file.xml

And an even better XPath expression if you want the version number after CFBundleShortVersionString :

'//key[text()="CFBundleShortVersionString"]/following-sibling::string[1]/text()'
Community
  • 1
  • 1
Gilles Quénot
  • 173,512
  • 41
  • 224
  • 223
2
awk 'NR==8,$0=$3' FS='[<>]'

Result

3.2.2
  • Set Field Separator to < or >
  • If on line 8, print Field 3

RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
Zombo
  • 1
  • 62
  • 391
  • 407
  • 1
    Thanks Steven, this works beautifully and is the most concise. I'll mark this as answered as soon as the ten minutes arrives... – Dan Dec 14 '14 at 23:40
1
xmllint 'myfile'|sed -n '8 s#.*>\([[:digit:].]\{1,\}\)<.*#\1#p'

if 3.2.2 is somewhere in the file and it is a unique value you can try sed like

xmllint 'myfile'|sed -n 's#.*>\(3.2.2\)<.*#\1#p'
repzero
  • 8,254
  • 2
  • 18
  • 40
1

With sed it can be done as below.

$ sed -rn '8s#<[a-z]+>([0-9.]+)</[a-z]+>#\1#p' file.xml
3.2.2
Kannan Mohan
  • 1,810
  • 12
  • 15