Okay, this is an easy one, but I can't figure it out.
Basically I want to extract all links (<a href="[^<>]*">[^<>]*</a>
) from a big html
file.
I tried to do this with sed
, but I get all kinds of results, just not what I want. I know that my regexp is correct, because I can replace all the links in a file:
sed 's_<a href="[^<>]*">[^<>]*</a>_TEST_g'
If I run that on something like
<div><a href="http://wwww.google.com">A google link</a></div>
<div><a href="http://wwww.google.com">A google link</a></div>
I get
<div>TEST</div>
<div>TEST</div>
How can I get rid of everything else and just print the matches instead? My preferred end result would be:
<a href="http://wwww.google.com">A google link</a>
<a href="http://wwww.google.com">A google link</a>
PS. I know that my regexp is not the most flexible one, but it's enough for my intentions.