So there is this website that shows the most popular websites. I am trying to write a script that will take two arguments: the first one is the html file, and the second one a text file. All the websites url should go to the second argument, so at the end the text file should contain stuff like:
http://www.website1.com/
http://www.website2.com/
...
If I say
cat argument1.html
stuff like this is printed:
<a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_nl&url=http%3A%2F%2Fwww.100bestwebsites.org%2F"><img src="Holland.gif" height="33" width="50"><br>DUTCH</a></font></div></td>
<td width="10%">
<div align="center"><font face="Arial, Helvetica, sans-serif" size="2"><a href="http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_el&url=http%3A%2F%2Fwww.100bestwebsites.org%2F"><img src="Greece.gif" height="33" width="50"><br>GREEK</a></font></div></td>
so you guys can see that there are a bunch of characters, but somewhere in the middle there are actually the websites. I need to use grep and sed.
Any help is appreciated. I know the basics of grep and sed, but it looks for this the basics are not enough.