2

This is my code:

$string = '<a href="http://www.mysite.com/test" class="prevlink">&laquo; Previous</a><a href=\'http://www.mysite.com/test/\' class=\'page\'>1</a><span class=\'current\'>2</span><a href=\'http://www.mysite.com/test/page/3/\' class=\'page\'>3</a><a href=\'http://www.mysite.com/test/page/4/\' class=\'page\'>4</a><a href="http://www.mysite.com/test/page/3/" class="nextlink">Next &raquo;</a>';
$string = htmlspecialchars($string, ENT_COMPAT, 'UTF-8');
$string = preg_replace('@(&lt;a).*?(nextlink)@s', '', $string);
    echo $string;

I am trying to remove the last link:

<a href="http://www.mysite.com/test/page/3/" class="nextlink">Next &raquo;</a>';

My current output:

">Next &raquo;</a>

It removes everything from the start. I want it to remove only the one with strpos, is this possible with preg_replace and how? Thanks.

Muazam
  • 379
  • 1
  • 6
  • 15

2 Answers2

1

Note: This is not a direct answer, but a suggestion to another approach.

I was told once; if you can do it in any other way, stay away from regex. I don't though, it's my white whale. Have you heard of phpQuery? It's jQuery implemented in PHP and very powerful. It would be able to do what you want in a very easy way. I know it's not regex, but perhaps it's of use to you.

If you really want to go ahead, I can recommend http://gskinner.com/RegExr/ . I think it's a great tool.

Simon Paarlberg
  • 277
  • 2
  • 10
1

quite a tricky question to solve

first off, the .*? will not match like you are expecting it to.

its starts from the left finds the first match for <a, then searches until it finds nextlink, which is essentially picking up the entire string.

for that regex to work as you wanted, it would need to match from the righthand side first and work backwards through the string, finding the smallest (non-greedy) match

i couldn't see any modifiers that would do this so i opted for a callback on each link, that will check and remove any link with nextlink in it

<?php
$string = '<a href="http://www.mysite.com/test" class="prevlink">&laquo; Previous</a><a href=\'http://www.mysite.com/test/\' class=\'page\'>1</a><span class=\'current\'>2</span><a href=\'http://www.mysite.com/test/page/3/\' class=\'page\'>3</a><a href=\'http://www.mysite.com/test/page/4/\' class=\'page\'>4</a><a href="http://www.mysite.com/test/page/3/" class="nextlink">Next &raquo;</a>';

echo "RAW: $string\r\n\r\n";

$string = htmlspecialchars($string, ENT_COMPAT, 'UTF-8');

echo "SRC: $string\r\n\r\n";


    $string = preg_replace_callback(
        '@&lt\;a.+?&lt;/a&gt;@',
        'remove_nextlink',
        $string
    );


function remove_nextlink($matches) {

    // if you want to see each line as it works, uncomment this
    // echo "L: $matches[0]\r\n\r\n";

    if (strpos($matches[0], 'nextlink') === FALSE) {
        return $matches[0]; // doesn't contain nextlink, put original string back
    } else {
        return ''; // contains nextlink, replace with blank
    }
}    

echo "PROCESSED: $string\r\n\r\n";
bumperbox
  • 10,166
  • 6
  • 43
  • 66
  • Thanks for the explanation, I have to learn pregEx it seems very powerful. The code works, thanks a lot. =) – Muazam Aug 31 '11 at 09:40
  • it's not the most elegant solution, but it does work, and writing regex to work on html is always hard work – bumperbox Aug 31 '11 at 09:43