0

I'm new in shell scripting so I need some help. By shell script I need to get the value of a href property from a html filtered by the class.

eg:

<a class="other class" href="value I don't need"></a> <a class="some class" href="url I need"></a>

In this case I need the href value of the a tag that got the class "some class". I need to put the value into a variable, requires to use sed o grep, I'm not good with reggex at all so I need your help plz.

dakairus
  • 107
  • 1
  • 11

3 Answers3

1

An alternative way using sed and grep.

var=`grep 'class="some class"' <file> | sed -r 's/^.+href="([^"]+)".+$/\1/'`

First grep finds the right line, then the sed replaces the entire line with only the bracketed bit (which is the value of href).

Edit: if you have multiple <a> tags on one line, it gets a bit more tricky. If you can assume that the format of the tag is always like the examples, then you can try this:

var=`grep 'class="some class"' <file> | sed -r 's/^.+class="some class"\s+href="([^"]+)".+$/\1/'`

If you can't assume that (maybe sometimes the href comes before the class) then you're better off using an html parser - regex can't really parse html properly.

cbreezier
  • 1,188
  • 1
  • 9
  • 17
0

Here is one way:

awk -F'href="' '/class="some class/ {split($2,a,"\"");print a[1]}' file
url I need
Jotne
  • 40,548
  • 12
  • 51
  • 55
0

Use grep 'some class'|sed -n 's/.*href="\(.*\)".*/\1/p'

$ cat aaa
<a class="other class" href="value I don't need"></a>
<a class="some class" href="url I need"></a>

$ cat aaa|grep 'some class'|sed -n 's/.*href="\(.*\)".*/\1/p'
url I need
jaekid
  • 31
  • 3