Insert paragraphs in text, except within code blocks

Question

I'm inserting HTML paragraphs (<p></p>) into a piece of text, like this:

$text = '<p>' . preg_replace("/(\n|\r|\r\n)+/i", "</p><p>", $text) . '</p>' ;

Which seems to work well, except I don't want any paragraphs within <code></code> blocks since content within those blocks are pre-formatted (using a white-space:pre; style).

I'm not sure how best to handle this. I've tried to remove any such tags after the above line of code, but that's causing me some trouble and I figure it would be much better not to insert them in the first place.

Is it possible and/or practical to make the exclusion in the regex above? If not, what else?

Thanks

Edit: Came up with this code based on Nameless' answer below. It appears to work.

$chunks = preg_split("/(<code>.*?<\/code>)/is", $text, -1, PREG_SPLIT_DELIM_CAPTURE) ;
$text = '' ;
foreach($chunks as $chunk) {
    if (preg_match("/^<code>/i", $chunk)) {
        $text .= $chunk ;
    } else {
        $text .= '<p>' . preg_replace("/(\n|\r)+/i", "</p><p>", $chunk) . '</p>' ;
    }
}

Sorry. This text "Line one\n\n\rLine two\r\nLine three\nLine four" would become "
Line one
Line two
Line three
Line four
". And I know CSS is for styling, but the HTML still tells CSS where to apply those styles. — , Aug 20 '11 at 15:04
You want to use an HTML toolkit for this. See http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662. — Gordon, Aug 21 '11 at 13:26

score 1 · Accepted Answer · answered Aug 20 '11 at 15:14

1

Well, it is possible with PCRE regex engine. Yet, highly irrational and resourse-heavy.

$text = '<p>' . preg_replace("/(\n|\r|\r\n)+(?!(.(?!<code>))*<\/code>)|(\n|\r|\r\n)+(?=<code>)/is", "</p><p>", $text) . '</p>' ;

Using DOM is probably the best solution, if you can spend some additional RAM on this operation. If not, you could split your string beforehand in chunks of <code> ... </code> and everything else, than use your regex on chunks not in <code>, than glue it back into string.

answered Aug 20 '11 at 15:14

Nameless

2,306
4
23
28

Thanks. The idea of splitting it up seems to be a good one. I edited in my solution based on that suggestion. It seems to work. I don't know if it's the most efficient way, maybe I'll look into DOM sometime. – Aug 20 '11 at 15:55

score -1 · Answer 2 · answered Aug 20 '11 at 15:00

-1

Never ever ever ever ever ever try to parse HTML with regex.

Use for example PHP's DOM: http://php.net/manual/en/book.dom.php

:)

answered Aug 20 '11 at 15:00

PeeHaa

71,436
58
190
262

Is that part of the standard installation of PHP, or something I need to install separately? If the latter, I may not have access to it (could check, of course). – Aug 20 '11 at 15:06
@MCXXII: The `libxml` needs to be installed. Although it is installed by default I think. You can check your installed extensions by doing: `phpinfo();` – PeeHaa Aug 20 '11 at 15:09
It's installed. I may look into it at some point. Thanks for the link. – Aug 20 '11 at 15:57

Insert paragraphs in text, except within code blocks

2 Answers2