1

I have the following string.

$string = 'Hello there how are <a 
href="http://eem.mydomain.com/2015/06/court-compels-epa-to-
respond.html">some link name</a> there how are there how are 
<a href="http://eem.mydomain.com/2014/03/wv-clean-air-act-case.html">another 
link name</a> ';

I need a PHP function that will convert the URLs in the string to the following URLs.

$new_string = 'Hello there how are <a href="http://eem.mydomain.com/energy-
environment-blog/court-compels-epa-to-respond">some link name</a> there how 
are there how are  
<a href="http://eem.mydomain.com/energy-environment-blog/wv-clean-air-act-
case">another link name</a> ';

In the new URLs, the year and month needs to be replaced with 'energy-environment-blog' and the .html extension needs to be removed. Can anyone help writing a pattern that will match the varying year/date in the URL and removal of the .html extension. That part is tripping me up.

<?php
$pattern = "";
$replacement = '';
$new_string = preg_replace($pattern, $replacement, $string);
?>
Jeff
  • 103
  • 10

1 Answers1

0

It is usually considered better to use a parser (e.g. DomDocument) instead but for a quick and dirty replacement, you could probably use

https?://\Qeem.mydomain.com\E/\K\d{4}/\d{2}/([^"'>]*?)\.html

and replace this with

energy-environment-blog/$1

See a demo on regex101.com.


The safer parser way would look like
<?php

$string = 'Hello there how are <a 
href="http://eem.mydomain.com/2015/06/court-compels-epa-to-
respond.html">some link name</a> there how are there how are 
<a href="http://eem.mydomain.com/2014/03/wv-clean-air-act-case.html">another 
link name</a> ';

$dom = new DomDocument();
$dom->loadHTML($string);

$xpath = new DomXPath($dom);

$regex = '~https?://\Qeem.mydomain.com\E/\K\d{4}/\d{2}/([^"\'>]*?)\.html~';
$replacement = 'energy-environment-blog/$1';


foreach ($xpath->query("//a[contains(@href, 'eem.mydomain.com')]") as $link){
    $link->setAttribute('href', preg_replace($regex, $replacement, $link->getAttribute('href')));
}

print_r($dom->saveHTML());
?>

With the first approach, you are executing the replacement on a string while with the second you'll do it on an attribute. The difference might look subtle in this case but it surely is safer.
For further reference, have a look at the most famous SO answer.

Jan
  • 42,290
  • 8
  • 54
  • 79