0

I'm looking for a regular expression that can extract the href from this:

<a href="/tr/blog.php?post=3593&user=930">

There are hundreds of links on the page so I need to extract only those that contain

/tr/blog.php

So in the end I should be left with a list of links that start in /tr/blog

Thanks for any help. It's really puzzling me.

This is the RegEx I am currently using, but it matches all.

/href\s*=\s*\"*[^\">]*/ig;
Brock Adams
  • 90,639
  • 22
  • 233
  • 295
James Jeffery
  • 12,093
  • 19
  • 74
  • 108

3 Answers3

2

You could try something like href=\"(/tr/blog.php[^"]*)\" (will capture to Group 1), but in general you should not use regex to parse HTML.

Community
  • 1
  • 1
VeeArr
  • 6,039
  • 3
  • 24
  • 45
0

This is a bit late, but now that it's the future, you don't even need the regular expression:

document.querySelectorAll("a[href*='/tr/blog.php']") will give you the links that contain that string, or you can find those that begin with that string document.querySelectorAll("[href^='/tr/blog.php']").

Instantiation
  • 303
  • 2
  • 7
0
<body> <a href="/tr/blog.php?lol">fslk</a> 

<script>

    var anchors = document.getElementsByTagName('a'), captured = [];

    for ( var i = 0, l = anchors.length, href, r = /tr\/blog\.php/; i<l; ++i ) {
         href = this.href;
         if ( r.test( href ) ) {
             captured.push( this )
         }
    }

    // do what u want with captured links
    for ( var l = captured.length; l--; ) {
        alert( captured[l].href )
    }

</script>

</body>
meder omuraliev
  • 183,342
  • 71
  • 393
  • 434