1

I'm a self taught PHP programmer and I'm only now starting to grasp the regex stuff. I'm pretty aware of its capabilities when it is done right, but this is something I need to dive in too. so maybe someone can help me, and save me so hours of experiment.

I have this string:

here is the <a href="http://www.google.com" class="ttt" title="here"><img src="http://www.somewhere.com/1.png" alt="some' /></a> and there is <a href="#not">not</a> a chance... 

now, I need to preg_match this string and search for the a href tag that has an image in it, and replace it with the same tag with a small difference: after the title attribute inside the tag, I'll want to add a rel="here" attribute. of course, it should ignore links (a href's) that don't have img tag inside.

Toto
  • 89,455
  • 62
  • 89
  • 125
Asaf Chertkoff
  • 1,255
  • 1
  • 9
  • 11

3 Answers3

6

First of all: never ever ever use regex for html!

You're much better off using an XML parser: create a DOMDocument, load your HTML, and then use XPath to get the node you want.

Something like this:

$str = 'here is the <a href="http://www.google.com" class="ttt" title="here"><img src="http://www.somewhere.com/1.png" alt="some" /></a> and there is <a href="#not">not</a> a chance...';
$doc = new DOMDocument();
$doc->loadHTML($str);
$xpath = new DOMXPath($doc);
$results = $xpath->query('//a/img');
foreach ($results as $result) {
    // edit result node
}
$doc->saveHTML();
Community
  • 1
  • 1
Peter Kruithof
  • 10,584
  • 6
  • 29
  • 42
  • Arnt you kinda missing the point of his question he saying he is using it as an exercise to learn some regex. But yes, dont use regex for HTML – Hugoagogo Jun 30 '11 at 09:04
  • Well if you should never parse HTML with regex, then trying to parse HTML with a regex to learn regex's isn't the best way to learn regex's. Can anyone suggest a better learning exercise for regex? – Jeff Welling Jun 30 '11 at 09:19
  • 2
    @Hugoagogo Seems to me he has an actual problem he needs to solve. The fact that he needs help on regular expressions is based on the assumption that regex's are the best way to solve his problem. While this particular problem _could_ be solved with a regex, using a DOMDocument is much better and future proof. – Peter Kruithof Jun 30 '11 at 10:50
1

Ideally you should use HTML (or XML) parser for this purpose. Here is an example using PHP built-in XML manipulation functions:

<?php
error_reporting(E_ALL);
$doc = new DOMDocument();
$doc->loadHTML('<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><body>
<p>here is the <a href="http://www.google.com" class="ttt" title="here"><img src="http://www.somewhere.com/1.png" alt="some" /></a> and there is <a href="#not">not</a> a chance...</p>
</body></html>');
$xpath = new DOMXPath($doc);
$result = $xpath->query('//a[img]');
foreach ($result as $r) {
    $r->setAttribute('rel', $r->getAttribute('title')); // i am confused whether you want a hard-coded "here" or the value of the title
}
echo $doc->saveHTML();

Output

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><body>
<p>here is the <a href="http://www.google.com" class="ttt" title="here" rel="here"><img src="http://www.somewhere.com/1.png" alt="some"></a> and there is <a href="#not">not</a> a chance...</p>
</body></html>
Salman A
  • 262,204
  • 82
  • 430
  • 521
  • thanks. another thing that helped me, because i needed to work with UTF-8 is this http://stackoverflow.com/questions/3872423/php-problem-with-russian-language – Asaf Chertkoff Jul 04 '11 at 14:14
  • and after saveHTML I needed to convert it back to utf-8. ` $newcontent = mb_convert_encoding($newcontent, "UTF-8", 'HTML-ENTITIES');` – Asaf Chertkoff Jul 04 '11 at 14:40
0

here a couple of link that might help you with Regex:

RegEx Tutorial

Email Samples of RegEx

I used the web site in the last link extensively in my previous Job. It is a great collections of RegEx that you can also test according to your specific case. First two links would help you to find to get some further knowledge about it.

Francesco
  • 9,947
  • 7
  • 67
  • 110