0

With preg_replace in PHP, I am trying to match a regex pattern multiple times in a string, sometimes there will be 2 matches on 1 lines, sometimes not.

I have the following string:

 $text = 'Check <a href="link1">text1</a> or <a href="link2">text2</a>
 oh
 well <a href="link3">text3</a>';

I would like it to convert to:

 Check
 text1
 link1
 or
 text2
 link2
 oh
 well
 text3
 link3

I have this:

 $text = preg_replace('/(<a href=")(.+)(">)(.*)(<\/a>)/', "\n$4\n$2\n", $text);

But it doesn't work, only when having 1 match at a line. Like:

 $text = 'Check <a href="link1">text1</a> 
 or <a href="link2">text2</a>
 oh
 well <a href="link3">text3</a>'; 

Any help appreciated.

Example with a and b http://www.phpliveregex.com/p/4fU

Sanne
  • 1,116
  • 11
  • 17
  • Does `link1` and `text1` really need to be in reverse order in your result string? – aliteralmind Mar 17 '14 at 13:24
  • Well, preferable, but not 100% necessary. It's more clear to a user. I am trying to create a plain-text mail that will be parsed for a HTML mail. – Sanne Mar 17 '14 at 13:28

2 Answers2

1

Iterate over all text nodes you can find inside the given HTML and create a special case for parent anchors:

$text = 'Check <a href="link1">text1</a> or <a href="link2">text2</a>
 oh
 well <a href="link3">text3</a>';

$dom = new DOMDocument;
$dom->loadHTML($text);

$xpath = new DOMXPath($dom);

foreach ($xpath->query('//text()') as $node) {
  if ($node->nodeType == XML_TEXT_NODE) {
        echo $node->textContent, "\n";
        if ($node->parentNode->nodeType == XML_ELEMENT_NODE && $node->parentNode->nodeName == 'a') {
                echo $node->parentNode->getAttribute('href'), "\n";
        }
  }
}

In a textual domain, you would do it like this:

echo preg_replace('~<a href="([^"]+)">([^<]+)</a>~i', "\n\$2\n\$1", $text);

Basically you use negative character sets for the href and tag contents enclosure instead of simply .+ and .* because those are greedy by default; this can be changed by using .+? and .*? respectively, but a negative character set would lead to less backtracking.

Also, you only need to perform memory captures on two parts of the anchor, not all five of them.

Ja͢ck
  • 170,779
  • 38
  • 263
  • 309
  • Thank you, but I would like to understand how to solve this with preg_replace – Sanne Mar 17 '14 at 13:42
  • Since I can reuse this pattern for non-HTML situations if I understand it. – Sanne Mar 17 '14 at 13:47
  • Thanks, but now I see an issue when the enclosed text1, text2, text3 has HTML tags as well `text1` – Sanne Mar 17 '14 at 13:56
  • @Sanne That's what I'm telling you that this is not a text problem. Problems in HTML are solved in another way. – Ja͢ck Mar 17 '14 at 13:57
  • Your other code also has problems with enclosing tags. The link gets removed. – Sanne Mar 17 '14 at 14:58
  • Just so you know - I opened another question specifically about the DOM solution. I understand that this is the way to go when replacing stuff in HTML.. Thanks – Sanne Mar 17 '14 at 15:12
-2

NOT for your problem but you can add modifiers to a regex pattern after last slash:

preg_replace('/whatever_my_pattern_do/MODIFIERS',"here I replace", $text);

You should check them all here

Olvathar
  • 551
  • 3
  • 10