-2

I have a HUGE html which has many things I don't need, but inside it has URLs that are provided in the following format:

<a href="http://www.retailmenot.com/" class=l

I'm trying to extract the URLs... I tried, to no avail:

open(FILE,"<","HTML.htm") or die "$!";
my @str = <FILE>;

my @matches = grep { m/a href="(.+?") class=l/ } @str

Any idea on how to match this?

brian d foy
  • 129,424
  • 31
  • 207
  • 592
snoofkin
  • 8,725
  • 14
  • 49
  • 86
  • tried it with : grep -iP --color=auto 'a href="(.+?)"\sclass=l FILE.TXT – snoofkin Sep 25 '10 at 00:47
  • 2
    @soulSurfer2010, please edit your revisions and what you tried (the two comments previous to this one) *into the question* (hit the 'edit' link below the tags.) It looks better formatted properly, and far easier to read and work with. – David Thomas Sep 25 '10 at 00:58
  • 7
    Why hasn't anybody linked to this classic http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags ? – Yuji Sep 25 '10 at 04:21
  • What Yuji says: just don't do it this way. – reinierpost Sep 27 '10 at 06:55