2

I was reading this question: Extract lines between 2 tokens in a text file using bash because I have a very similar problem... I have to extract (and save it to $variable before printing) text in this xml file:

<--more labels up this line>
<ExtraDataItem name="GUI/LastVMSelected" value="14cd3204-4774-46b8-be89-cc834efcba89"/>
<--more labels and text down this line-->

I only need to get the value= (obviously without brackets and no 'value='), but first, I think it have to search "GUI/LastVMSelected" to get to this line, because there could be a similar value field in other lines,and the value of that label is that i want.

Community
  • 1
  • 1
Mr_LinDowsMac
  • 2,644
  • 9
  • 56
  • 75

4 Answers4

3

If they are on the same line (as they seem to be from your example), it's even easier. Just:

sed -ne '/name="GUI\/LastVMSelected"/s/.*value="\([^"]*\)".*/\1/p'

Explanation:

  • -n: Suppress default print
  • /name="GUI\/LastVMSelected"/: only lines matching this pattern
  • s/.value="([^"])"./\1/p
    • substitute everything, capturing the parenthesized part (the value of value)
    • and print the result
Jan Hudec
  • 73,652
  • 13
  • 125
  • 172
  • Thanks, it works! but I need to catch it in a $variable because I need it in a script. – Mr_LinDowsMac Feb 01 '11 at 08:30
  • So why not just use `varname=`sed whatever`` (note the backticks)? – mpenkov Feb 01 '11 at 08:48
  • I found a way:VAR=$(sed -ne '/name="GUI\/LastVMSelected"/s/.*value="\([^"]*\)".*/\1/p' /some/place.xml) echo $VAR That's the way that I want. Now I can use variable $VAR in some part of the script – Mr_LinDowsMac Feb 01 '11 at 08:58
1

I'm assuming that you're extracting from an XML document. If that is the case, have a look at the XMLStarlet command-line tools for processing XML. There's some documentation for querying XML docs here.

Brian Agnew
  • 268,207
  • 37
  • 334
  • 440
1

Use this:

for f in `grep "GUI/LastVMSelected" filename.txt | cut -d " " -f3`; do echo ${f:7:36}; done
  • grep gets you only the lines you need
  • cut splits the lines using some separator, and returns the Nth result of the split
  • -d " " sets the separator to space
  • -f3 returns the third result (1-based indexing)
  • ${f:7:36} extracts the substring starting at index 7 that is 36 characters long. This gets rid of the leading value=" and trailing slash, etc.

Obviously if the order of the fields changes, this will break, but if you're just after something quick and dirty that works, this should be it.

mpenkov
  • 21,621
  • 10
  • 84
  • 126
  • That does not strip the value= and quotes. – Jan Hudec Feb 01 '11 at 08:18
  • That depends on the string being at a particular position in the line and is not at all reliable. It also breaks the line up on white space so it wouldn't even work if it were. – Dennis Williamson Feb 01 '11 at 08:28
  • I've specifically said that **"if the order of the fields changes, this will break"**. Nobody sane would expect that to be reliable after reading that disclaimer. – mpenkov Feb 01 '11 at 08:31
  • It seems that using sed get a most clean output, but I still need to catch the value in a variable, because I need it in a script. – Mr_LinDowsMac Feb 01 '11 at 08:43
0

Using my answer from the question you linked:

sed -n '/<!--more labels up this line-->/{:a;n;/<!--more labels and text down this line-->/b;\|GUI/LastVMSelected|s/value="\([^=]*\)"/\1/p;ba}' inputfile

Explanation:

  • -n - don't do an implicit print
  • /<!-- this is token 1 -->/{ - if the starting marker is found, then
    • :a - label "a"
      • n - read the next line
      • /<!-- this is token 2 -->/q - if it's the ending marker, quit
      • \|GUI/LastVMSelected| - if the line matches the string
        • s/value="\([^"]*\)"/\1/p - print the string after 'value=' and before the next quote
    • ba - branch to label "a"
  • } end if
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439