0

I have an file with this html code inside:

 <p class="center-block"><img alt="ourpicture" class="picture" src="http://mypage.com/ourpicture123" /></p>

Now I would like to get just the source like http://mypage.com/ourpicture123. How can I handle this problem with sed? It would be great if I can look for 'src="' before and '"' after.

  • possible duplicate of [Easiest way to extract the urls from an html page using sed or awk only](http://stackoverflow.com/questions/1881237/easiest-way-to-extract-the-urls-from-an-html-page-using-sed-or-awk-only) – NeronLeVelu Feb 10 '15 at 11:35

2 Answers2

0

Through sed,

$ sed -n 's/.*\bsrc="\([^"]*\)".*/\1/p' file
http://mypage.com/ourpicture123

Through grep,

grep -oP '\bsrc="\K[^"]*(?=")' file

The above sed command won't work if a line contains more than one src attribute present on a line. \K in the above grep command would discard the previously matched src=" characters from printing at the final.

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
0

Here is an awk version:

awk -F'src="' '{split($2,a,"\"");print a[1]}' file
http://mypage.com/ourpicture123

Or like this:

awk -F'src="' '{sub(/".*$/,"",$2);print $2}' file
http://mypage.com/ourpicture123

If you have several lines, and only needs line with src= do:

awk -F'src="' 'NF>1{split($2,a,"\"");print a[1]}' file
http://mypage.com/ourpicture123
Jotne
  • 40,548
  • 12
  • 51
  • 55