1

i have an html page

<tr>
<td rowspan="7">
<a href="http://www.link1.com/" style="text-decoration: none;">
        <img src="image1.jpg" width="34" height="873" alt="" style="display:block;border:none" />
        </a>
    </td>
    <td colspan="2" rowspan="2">
        <a href='http://www.link1.com/test.php?c=1'>
        <img src="image1.jpg" width="287" height="146" alt="" style="display:block;border:none" />
        </a>
    </td>
<td colspan="2" rowspan="2">
        <a href='http://www.url.com/test.php?c=1'>
        <img src="image1.jpg" width="287" height="146" alt="" style="display:block;border:none" />
        </a>
    </td>

I want to replace all url in href by mytest.com?url=$link

I try with :

    $messaget = preg_replace('/<a(.*)href="([^"]*)"(.*)>/','mytest.com?url=$2',$messaget);
Yobogs
  • 443
  • 1
  • 5
  • 17
  • 2
    PHP is server-side code... so I'm not sure what/how you're trying to accomplish your result. – adamdehaven Aug 29 '13 at 16:05
  • Sorry ... My html code is in variable $messaget. – Yobogs Aug 29 '13 at 16:13
  • 1
    You should never use regex for dealing with HTML code, use an HTML parser instead. See http://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php and http://simplehtmldom.sourceforge.net/. – Technoh Aug 29 '13 at 16:17

4 Answers4

1

This may help you in the short run:

preg_replace('/<a (.*)href=[\'"]([^"]*)[\'"](.*)>/', '<a $1href="mytest.com?url=$2"$3>', $messaget);

In your regex you were using href="...", that is, double quotes, but in your HTML you have a mixture of both double and single quotes.

And in the replacement string you forgot to include $1 and $3.

That said, DO NOT use regex to parse HTML. The answer by @BenLanc below is better, use that instead. Read the link he posted.

Community
  • 1
  • 1
janos
  • 120,954
  • 29
  • 226
  • 236
  • His regex works only for double quotes the issue is he needs to account for both double and single as per his supplied sample. So the regex you've provided still won't do the job. Though you did fix the replacement error – Kieran Aug 29 '13 at 16:17
  • You're right. I didn't see earlier that he had also with double quotes, I saw only single quotes. Thanks, will fix now. – janos Aug 29 '13 at 16:19
  • @janos It's cool, undid the downvote and deleted my comment when you updated your answer – BenLanc Aug 30 '13 at 09:19
1

Don't use regex on HTML, HTML is not regular.

Assuming your markup is valid (and if it's not, pass it through Tidy first), you should use xpath, to grab the elements and then update the href directly. For example:

<?php
$messaget = <<<XML
<tr>
  <td rowspan="7">
    <a href="http://www.link1.com/" style="text-decoration: none;">
      <img src="image1.jpg" width="34" height="873" alt="" style="display:block;border:none" />
    </a>
  </td>
  <td colspan="2" rowspan="2">
      <a href='http://www.link1.com/test.php?c=1'>
      <img src="image1.jpg" width="287" height="146" alt="" style="display:block;border:none" />
      </a>
  </td>
  <td colspan="2" rowspan="2">
      <a href='http://www.url.com/test.php?c=1'>
      <img src="image1.jpg" width="287" height="146" alt="" style="display:block;border:none" />
      </a>
  </td>
</tr>
XML;

$xml   = new SimpleXMLElement($messaget);

// Select all "a" tags with href attributes
$links = $xml->xpath("//a[@href]");

// Loop through the links and update the href, don't forget to url encode the original!
foreach($links as $link)
{
  $link["href"] = sprintf("mytest.com/?url=%s", urlencode($link['href']));
}

// Return your HTML with transformed hrefs!
$messaget = $xml->asXml();
Community
  • 1
  • 1
BenLanc
  • 2,344
  • 1
  • 19
  • 24
0

Regex to match an url:

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/  

More background info

lordkain
  • 3,061
  • 1
  • 13
  • 18
0

Don't forget /m at the end of your regexp since your are using multiline source:

PHP Doc PCRE

robinef
  • 312
  • 1
  • 7