-1

I have this html :

<a href="https://m.com/link/NX1B4efPlb2Es3xh1ip" target="_blank" style="-ms-text-size-adjust: 100%; -webkit-text-size-adjust: 100%; cursor: pointer; word-wrap: break-word; word-break: break-word; color: #FFFFFF; text-decoration: none;">Specific word</a>

And I'm looking for a regex who extract the href only of the link who have "Specific word" text :

Extract the href https://m.com/link/NX1B4efPlb2Es3xh1ip when the text <a> is Specific word.

Thank you

timothylhuillier
  • 451
  • 1
  • 8
  • 20
  • 2
    Possible duplicate of [RegEx match open tags except XHTML self-contained tags](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) – Freggar Jul 24 '19 at 08:55
  • Is `only text content` really your only possible input, or do you want to pick that specific tag out of arbitrary HTML? Because the latter isn't possible. – Pezo Jul 24 '19 at 09:08
  • [Parsing HTML with regex is a hard job](https://stackoverflow.com/a/4234491/372239) – Toto Jul 24 '19 at 09:20

2 Answers2

4

If you really want to do it with Regex, I would suggest something like this:

/.*href=\"(.*?)\".*>Specific word.*/g

Explanation:

  • .* matches every possible beginning
  • href matches the word href
  • \" matches the "
  • (.*?) is a non-greedy match for the href content, which stores the result in the capture group (the capture group is what you are looking for)
  • \" matches the closing "
  • .*> matches the rest of the tag until it is closed
  • Specific word matches the specific word
  • .* matches all the rest.
Markus Weninger
  • 11,931
  • 7
  • 64
  • 137
  • I would suggest a XML parser. – Stefan Jul 24 '19 at 08:56
  • Thank you markus for your help but the regex match all the sentence and not only the href in #, online tester : [the regex C# tester](http://regexstorm.net/tester?p=.*href%3d%5c%22%28.*%3f%29%5c%22.*%3eSpecific+word.*&i=%3ca+href%3d%22https%3a%2f%2fm.com%2flink%2fNX1B4efPlb2Es3xh1ip%22+target%3d%22_blank%22+style%3d%22-ms-text-size-adjust%3a+100%25%3b+-webkit-text-size-adjust%3a+100%25%3b+cursor%3a+pointer%3b+word-wrap%3a+break-word%3b+word-break%3a+break-word%3b+color%3a+%23FFFFFF%3b+text-decoration%3a+none%3b%22%3eSpecific+word%3c%2fa%3e%0d%0a) – timothylhuillier Jul 24 '19 at 09:18
  • 1
    It matches the whole string but captures the link target (the parenthesized bit), just use that capture to extract the target. – Pezo Jul 24 '19 at 09:23
  • @timothylhuillier As Pezo said, just use the capture group and you are good to go :) – Markus Weninger Jul 24 '19 at 10:04
0

If you use capturing groups like so:

Regex query = new Regex(".*href=\"(?<link>.*?)\".*>(?<name>.*?)</a>");

and then validate the result

Match match = query.Match(input);

if (match.Success && match.Groups["name"].Value == "Specific Word")
{
    // Do something with match.Groups["link"].Value
}

If you have potentially multiple results you could loop through like so:

MatchCollection mc = query.Matches(page);
foreach (Match m in mc){
    if(m.Groups["name"].Value == "Specific Word")
    {
        // Do something with m.Groups["link"].Value
    }
}
andyb952
  • 1,931
  • 11
  • 25