0

So what I am trying to do is to match a regular expression which has an opening <p>; tag and a closing &lt/;p> tag.This is the code I wrote:

<?php
$input = "&lt;p&gtjust some text&lt;/p&gt more text!";
$input = preg_replace('/&lt;p&gt[^(&lt;\/p&gt)]+?&lt\/;p&gt/','<p>$1</p>',$tem);
echo $input;
?>

So the code does not seem to replace &lt;p&gt with <p> or replace &lt;/p&gt with </p>.I think the problem is in the part where I am checking all characters expect '&lt;/p&gt. I don't think the code [^(&lt;\/p&gt)] is grouping all the characters correctly. I think it checks if any of the characters are not present and not if the entire group of characters is not present. Please help me out here.

just_a_coder
  • 282
  • 1
  • 4
  • 14
  • `'

    $1<\/p>,$tem);` -- you're missing a single-quote here. Are you sure that's not the issue?

    – Amal Murali Dec 19 '13 at 18:16
  • Does your variable contain `<` or `<`? – gen_Eric Dec 19 '13 at 18:17
  • Use some character other than / to delimit your pattern, so you don't have to worry about escaping all the / in the pattern (`'#<p&gt...`). Also, the character class operators [ and ] do not work on sequences/groups, only individual characters). – Phil Perry Dec 19 '13 at 18:27
  • @Phil Perry how do i compare if something is not a sequence of characters,since [] seems to compare only individual characters? – just_a_coder Dec 19 '13 at 18:30
  • If you're looking to replace one of several sequences with their corresponding replacement sequences, you could use an array of target patterns and an array of replacement strings, rather than one horrendous regular expression. – Phil Perry Dec 19 '13 at 18:36

4 Answers4

0

[] in a RegEx is a character group, you can not match strings this way, only characters or unicode codepoints.

If you have escaped HTML entities, you can use htmlspecialchars_decode() to convert them back into characters.

After you have valid HTML, you can use the DOM to to parse, traverse and manipulate it. How do you parse and process HTML/XML in PHP?

Community
  • 1
  • 1
ThW
  • 19,120
  • 3
  • 22
  • 44
0

I think i figured it out.Here is the code:

<?php
$input = "<p>text</p>";
$tem = $input;
$tem = htmlspecialchars($input);
$tem = preg_replace('/&lt;p&gt;(.+?)&lt;\/p&gt;/','<p>$1</p>',$tem);
echo $tem;
?>
just_a_coder
  • 282
  • 1
  • 4
  • 14
0

You don't need to capture the content between p tags, you only need to replace p tags:

$html = preg_replace('~&lt;(/?p)&gt;~', '<$1>', $html);

However, you don't regex too:

$trans = array('&lt;p&gt;' => '<p>', '&lt;/p&gt;' => '</p>');
$html = strtr($html, $trans);
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
0

At least part of the trouble you're having is probably due to the fact that you seem to be playing fast and loose with the semicolons in your HTML entities. They always start with an ampersand, and end with a semicolon. So it's &gt;, not &gt as you have scattered through your post.

That said, why not use html_entity_decode(), which doesn't require abusing regular expressions?

$string = 'shoop &lt;p&gt;da&lt;/p&gt; woop';
echo html_entity_decode($string);
// output: shoop <p>da</p> woop
Sammitch
  • 30,782
  • 7
  • 50
  • 77