How to extract from a file text between tokens using bash scripts

Question

I was reading this question: Extract lines between 2 tokens in a text file using bash because I have a very similar problem... I have to extract (and save it to $variable before printing) text in this xml file:

<--more labels up this line>
<ExtraDataItem name="GUI/LastVMSelected" value="14cd3204-4774-46b8-be89-cc834efcba89"/>
<--more labels and text down this line-->

I only need to get the value= (obviously without brackets and no 'value='), but first, I think it have to search "GUI/LastVMSelected" to get to this line, because there could be a similar value field in other lines,and the value of that label is that i want.

If this is a xml/html, you should consider to use a proper xml parser — ajreal, Feb 01 '11 at 08:13

score 3 · Accepted Answer · answered Feb 01 '11 at 08:17

3

If they are on the same line (as they seem to be from your example), it's even easier. Just:

sed -ne '/name="GUI\/LastVMSelected"/s/.*value="\([^"]*\)".*/\1/p'

Explanation:

-n: Suppress default print
/name="GUI\/LastVMSelected"/: only lines matching this pattern
s/.value="([^"])"./\1/p
- substitute everything, capturing the parenthesized part (the value of value)
- and print the result

answered Feb 01 '11 at 08:17

Jan Hudec

73,652
13
125
172

Thanks, it works! but I need to catch it in a $variable because I need it in a script. – Mr_LinDowsMac Feb 01 '11 at 08:30
So why not just use `varname=`sed whatever`` (note the backticks)? – mpenkov Feb 01 '11 at 08:48
I found a way:VAR=$(sed -ne '/name="GUI\/LastVMSelected"/s/.*value="$[^"]*$".*/\1/p' /some/place.xml) echo $VAR That's the way that I want. Now I can use variable $VAR in some part of the script – Mr_LinDowsMac Feb 01 '11 at 08:58

score 1 · Answer 2 · answered Feb 01 '11 at 08:14

1

I'm assuming that you're extracting from an XML document. If that is the case, have a look at the XMLStarlet command-line tools for processing XML. There's some documentation for querying XML docs here.

answered Feb 01 '11 at 08:14

Brian Agnew

268,207
37
334
440

mpenkov · Answer 3 · 2011-02-01T08:24:48.950

1

Use this:

for f in `grep "GUI/LastVMSelected" filename.txt | cut -d " " -f3`; do echo ${f:7:36}; done

grep gets you only the lines you need
cut splits the lines using some separator, and returns the Nth result of the split
-d " " sets the separator to space
-f3 returns the third result (1-based indexing)
${f:7:36} extracts the substring starting at index 7 that is 36 characters long. This gets rid of the leading value=" and trailing slash, etc.

Obviously if the order of the fields changes, this will break, but if you're just after something quick and dirty that works, this should be it.

edited Feb 01 '11 at 08:24

answered Feb 01 '11 at 08:16

mpenkov

21,621
10
84
126

That does not strip the value= and quotes. – Jan Hudec Feb 01 '11 at 08:18
That depends on the string being at a particular position in the line and is not at all reliable. It also breaks the line up on white space so it wouldn't even work if it were. – Dennis Williamson Feb 01 '11 at 08:28
I've specifically said that **"if the order of the fields changes, this will break"**. Nobody sane would expect that to be reliable after reading that disclaimer. – mpenkov Feb 01 '11 at 08:31
It seems that using sed get a most clean output, but I still need to catch the value in a variable, because I need it in a script. – Mr_LinDowsMac Feb 01 '11 at 08:43

score 0 · Answer 4 · answered Feb 01 '11 at 08:21

Using my answer from the question you linked:

sed -n '/<!--more labels up this line-->/{:a;n;/<!--more labels and text down this line-->/b;\|GUI/LastVMSelected|s/value="\([^=]*\)"/\1/p;ba}' inputfile

Explanation:

-n - don't do an implicit print
//{ - if the starting marker is found, then
- :a - label "a"
  - n - read the next line
  - //q - if it's the ending marker, quit
  - \|GUI/LastVMSelected| - if the line matches the string
    - s/value="$[^"]*$"/\1/p - print the string after 'value=' and before the next quote
- ba - branch to label "a"
} end if

How to extract from a file text between tokens using bash scripts

4 Answers4

Linked