-1

I Would like to write a custom regular expression where the format is like class="r"><a href="http://www.hihostels.com/" where

1.class="r"><a href=" is fixed
2. http://www.hihostels.com/ is variable
3. " is fixed

Mohammad Olfatmiri
  • 1,605
  • 5
  • 31
  • 57
  • 3
    [Don't parse HTML with a RegEx.](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Ondrej Tucny May 01 '13 at 12:50

1 Answers1

1

I suggest you use an HTML parsing engine like HTMLAgilityPack http://htmlagilitypack.codeplex.com/. These parsing tools tend to have a rather steep learning curve so if you're looking something quick and easy but might be tripped up by edge case scenarios then consider the following powershell example of a universal regex:

    $Matches = @()
    $String = '<div class="r"><a href="http://www.hihostels.com/" class="RememberToVote">click me</a></div'
    ([regex]'class="r"><a href="([^"]*)"').matches($String) | foreach {
        write-host "at $($_.Groups[1].Index) = '$($_.Groups[1].Value)'"
        } # next match

yields

at 24 = 'http://www.hihostels.com/'

this works by assuming you'll always have the string class="r"><a href=" followed by the string of characters you're looking to capture, in this case you're looking for all non double quote characters [^"]* until it reaches a double qoute.

Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43