unable to understand how to match all characters except a given sequence with preg_replace() in php

Question

So what I am trying to do is to match a regular expression which has an opening <p>; tag and a closing &lt/;p> tag.This is the code I wrote:

<?php
$input = "&lt;p&gtjust some text&lt;/p&gt more text!";
$input = preg_replace('/&lt;p&gt[^(&lt;\/p&gt)]+?&lt\/;p&gt/','<p>$1</p>',$tem);
echo $input;
?>

So the code does not seem to replace <p&gt with <p> or replace </p&gt with </p>.I think the problem is in the part where I am checking all characters expect '</p&gt. I don't think the code [^(<\/p&gt)] is grouping all the characters correctly. I think it checks if any of the characters are not present and not if the entire group of characters is not present. Please help me out here.

`'
$1<\/p>,$tem);` -- you're missing a single-quote here. Are you sure that's not the issue? — Amal Murali, Dec 19 '13 at 18:16
Use some character other than / to delimit your pattern, so you don't have to worry about escaping all the / in the pattern (`'#<p&gt...`). Also, the character class operators [ and ] do not work on sequences/groups, only individual characters). — Phil Perry, Dec 19 '13 at 18:27
@Phil Perry how do i compare if something is not a sequence of characters,since [] seems to compare only individual characters? — just_a_coder, Dec 19 '13 at 18:30
If you're looking to replace one of several sequences with their corresponding replacement sequences, you could use an array of target patterns and an array of replacement strings, rather than one horrendous regular expression. — Phil Perry, Dec 19 '13 at 18:36

score 0 · Answer 1 · edited May 23 '17 at 10:25

0

[] in a RegEx is a character group, you can not match strings this way, only characters or unicode codepoints.

If you have escaped HTML entities, you can use htmlspecialchars_decode() to convert them back into characters.

After you have valid HTML, you can use the DOM to to parse, traverse and manipulate it. How do you parse and process HTML/XML in PHP?

edited May 23 '17 at 10:25

Community

1
1

answered Dec 19 '13 at 18:23

ThW

19,120
3
22
44

since [] matches only individual characters, what do i use if i want to match strings? – just_a_coder Dec 19 '13 at 18:36

score 0 · Accepted Answer · answered Dec 19 '13 at 18:24

0

I think i figured it out.Here is the code:

<?php
$input = "<p>text</p>";
$tem = $input;
$tem = htmlspecialchars($input);
$tem = preg_replace('/&lt;p&gt;(.+?)&lt;\/p&gt;/','<p>$1</p>',$tem);
echo $tem;
?>

answered Dec 19 '13 at 18:24

just_a_coder

282
1
4
14

score 0 · Answer 3 · answered Dec 19 '13 at 18:27

You don't need to capture the content between p tags, you only need to replace p tags:

$html = preg_replace('~&lt;(/?p)&gt;~', '<$1>', $html);

However, you don't regex too:

$trans = array('&lt;p&gt;' => '<p>', '&lt;/p&gt;' => '</p>');
$html = strtr($html, $trans);

score 0 · Answer 4 · answered Dec 19 '13 at 18:41

At least part of the trouble you're having is probably due to the fact that you seem to be playing fast and loose with the semicolons in your HTML entities. They always start with an ampersand, and end with a semicolon. So it's >, not &gt as you have scattered through your post.

That said, why not use html_entity_decode(), which doesn't require abusing regular expressions?

$string = 'shoop &lt;p&gt;da&lt;/p&gt; woop';
echo html_entity_decode($string);
// output: shoop <p>da</p> woop

unable to understand how to match all characters except a given sequence with preg_replace() in php

4 Answers4