What does these regular expressions mean in this code?

Question

I was trying to change some parts of a joomla plugin, when I faced this part of it and I have no idea what it's doing.

Can someone please explain to me what these regular expressions and those ${4} do?

    $comStart = '';
    $comEnd = '';

    $output = JResponse::getBody();
    $output = preg_replace('/\<meta name=\"og\:/', '<meta property="og:', $output);
    $output = preg_replace('/\<meta name=\"fb:admins/', '<meta property="fb:admins', $output);
    $output = preg_replace('/<(\w+) (\w+)="(\w+):(\w+)" (\w+)="([a-zA-Z0-9\ \_\-\:\.\&\/\,\=\!\?]*)" \/>/i', $comStart.'<${1} ${2}="${3}:${4}" ${5}="${6}" >'.$comEnd, $output);

FYI: This plugin is for displaying facebook and opengraph tags inside articles.

It's making me bleed from the eyes ... they replace wrong elements with their correct counterparts ... the `${4}` is a back reference — Ja͢ck, Jun 12 '12 at 15:03
those "${4}" are backreferences. You can look at : http://www.php.net/manual/en/function.preg-replace.php --> parameters --> replacement — rap-2-h, Jun 12 '12 at 15:05
Oh god! I don't know anything about regexes. Where should I start? — Farid Rn, Jun 12 '12 at 15:11
@faridv you should start by throwing them away and see if everything still works :D — Ja͢ck, Jun 12 '12 at 15:12
People, I'm impressed with your speed of answering, I'm wandering which answer should I mark as accepted!? — Farid Rn, Jun 12 '12 at 15:18
@Jack I just did what you said and removed all those regex codes and now ` — Farid Rn, Jun 12 '12 at 15:41
@Jack first and second regex was trying to replace `name` property of meta tag to `property`. I deleted it and so now I have ` — Farid Rn, Jun 12 '12 at 16:06

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

SERIOUS NOTE!

The use of regular expressions to parse/match HTML/XML is highly discouraged. Seriously, don't do it

Basically, it's a regular expression to parse/match HTML. Which may have slight side effects of not working, hard to maintain, and insanity.

The ${N} ones are called back-reference, they reference to the Nth brackets matched in the regular expressions.

If you require to do manipulation of HTML strings in PHP, you should use the DOMDocument class which was made exactly for this.

Example

<?php

$html_string = <<<HTML
<!DOCTYPE HTML>
<html lang="en-US">
<head>
  <meta charset="UTF-8">
  <title></title>
</head>
<body>

  <div id="target">
    This is the target DIV! <span>This span will change texts!</span>
  </div>

</body>
</html>
HTML;

$dom = new DOMDocument();
// Loading HTML from string...
$dom->loadHTML($html_string);

//Retrieve target and span elements
$target = $dom->getElementById("target");
$span = $target->getElementsByTagName("span")->item(0);

//Remove text, firstChild is the text node.
$span->removeChild($span->firstChild);
//Append new text
$span->appendChild(new DOMText("This is the new text!"));
//Change an attribute
$span->setAttribute("class", "spanny");

//Save HTML to string
$html_string = $dom->saveHTML();

echo $html_string;

Regular Expressions aren't bad, evil, or scary, they are simply the wrong tool for the job, you don't stick a nail with a jackhammer do you?

Thanks man. Can you explain me how can I change content of that target `span` with something else using DOM tool? — Farid Rn, Jun 12 '12 at 15:43

score 3 · Answer 2 · answered Jun 12 '12 at 15:07

$output = preg_replace('/\<meta name=\"og\:/', '<meta property="og:', $output);

Replace the string <meta name="og: with <meta property="og:. Kind of pointless - regex is not needed here.

$output = preg_replace('/\<meta name=\"fb:admins/', '<meta property="fb:admins', $output);

Replace <meta name="fb:admins with <meta property="fb:admins. Just as pointless - regex is not needed here.

$output = preg_replace('/<(\w+) (\w+)="(\w+):(\w+)" (\w+)="([a-zA-Z0-9\ \_\-\:\.\&\/\,\=\!\?]*)" \/>/i', $comStart.'<${1} ${2}="${3}:${4}" ${5}="${6}" >'.$comEnd, $output);

Replace a string like <word1 word2="word3:word4" word5="word6withspecialcharacterslike-:.etc." /> with <word1 word2="word3:word4" word5=word6withspecialcharacterslike-:.etc." >. So it only removes a trailing slash before the closing >. Very suspect and Voodoo-like use of regex.

Also, all those regexes are highly inelegant (lots of pointless escapes, for example) and show that whoever wrote those doesn't know much about regexes. Letting something like this loose on HTML is asking for trouble.

AVOID! AVOID! AVOID!

I don't know anything about regexes too! I think it's easier for me to delete all those regexes and find a better way to do that instead. — Farid Rn, Jun 12 '12 at 15:14

score 2 · Answer 3 · answered Jun 12 '12 at 15:07

Each (\w+) says find a word and store it. So you are doing this (in pseudocode)

find /(word1) (word2)="(word3)" (word4)="(manypossiblechars5)"/ignoring case

replace pattern with $comStart.<word1 word2="word3:word4" manypossiblechars5="word6">.$comEnd

score 2 · Answer 4 · answered Jun 12 '12 at 15:08

The first one tries to replace tags of the form <meta name="og:... with <meta property="og:...

The second similarly replaces tags starting <meta name="fb:admins... with <meta property="fb:admins...

Finally, the third seems to take tags of the form <word word="word:word" word="something" \/> and wraps them with $comStart and $comEnd.

This is done by matching the parts of the tag (placing () around them) and then using backreferences such as ${4} to refer to the 4th matched part.

Here $comStart and $comEnd are set to '' so that seems a little pointless. It also manages to get rid of the closing slash for the tag at the same time, though who knows if that is intentional!

score 2 · Answer 5 · answered Jun 12 '12 at 15:11

Those expressions attempt to fix the document head code by:

rewriting <meta name="og:*" to `
rewriting <meta name="fb:admins" to <meta property="fb:admins"
rewriting meta tags with a dangling slash to one without it (assuming it will always have two attributes.

This is just horrendous code, and as long as your templates don't have those "mistakes" in them, you can throw this crap away.

What does these regular expressions mean in this code?

5 Answers5

SERIOUS NOTE!

Example