4

I have a system set up for users to submit their articles into my database. Since it will just be HTML, I don't want to expect them to know to type <br /> every time there's a newline, so I am using the PHP function nl2br() on the input.

I'm also providing an article modification tool, which will bring their articles back into the form (this is a different page, however) and allow them to edit it. In doing this, the <br /> elements were appearing also (with newlines still). To remedy the
elements appearing (which I had expected, anyway) I added preg_replace('/<br(\s+)?\/?>/i', "\n", mysql_result($result,$i,"content")) which I had found in another question on this site. It does the job of removing the <br /> elements, but since it is replacing them with newlines, and the newlines would have remained originally anyway, every time the post is edited, more and more newlines will be added, spacing out the paragraphs more and more each time. This is something a user won't understand.

As an example, say I enter the following into the article submission form:

 Hello, this is my article.
 I am demonstrating a new line here.

This will convert to:

Hello, this is my article.<br />
I am demonstrating a new line here.

Notice that, even though the newline character was converted, there is still a newline in the text. In the editing form, the <br /> will be converted back to newline and look like this:

Hello, this is my article.

I am demonstrating a new line here.

Because the <br /> was converted to a newline, but there was already a newline. So I guess what I'm expecting is for it to originally be converted to something like this:

Hello, this is my article.<br />I am demonstrating a new line here.

I'm wondering ... is there a way to stop the nl2br() function from maintaining the original newlines? Might it have to do with the Windows \r\n character?

BenMorel
  • 34,448
  • 50
  • 182
  • 322
muttley91
  • 12,278
  • 33
  • 106
  • 160
  • 2
    You are probably in for some serious headache, unless your users are both good at HTML and absolutely trustworthy. Because if you allow people to upload HTML into your website, mostly unfiltered, you are introducing a gaping XSS hole, and probably a bunch of social engineering entry points as well. – tdammers Feb 26 '12 at 16:22
  • 2
    Why do you need to convert new lines to
    upon submitting article in the first place? You can format article when you're displaying it.
    – lia ant Feb 26 '12 at 16:27
  • @tdammers The only reason I'm not filtering it currently is because I want to allow them to put the following: `

    ` and `` If I can ONLY allow them to put those, and nothing else, I would gladly do that as I very much agree that submitting HTML to my site is not welcome. I should mention that those who can submit articles are restricted by an invite-only registration process, but nonetheless it's still something I'd fix given the opportunity.

    – muttley91 Feb 26 '12 at 17:20
  • @liaant This is a much better idea than modifying the content going into the database. I'm not sure why I didn't think of this. – muttley91 Feb 26 '12 at 17:21
  • 1
    @rar: Even has some opportunities for XSS, e.g. `` – tdammers Feb 26 '12 at 19:56
  • That's a very good point. I'm now looking into switching over to BBCode or another mark-up that will prevent users from being able to insert such things. – muttley91 Feb 26 '12 at 22:54

3 Answers3

2

It seems like the problem you described is not a bug, but a feature of bl2br. You could just write your own function for it, like:

<?php
function NlToBr($inString)
{
    return preg_replace("%\n%", "<br>", $inString);
}
?>

I found this one in the comments of the documentation of the nl2br-function in the PHP Manual: http://php.net/manual/de/function.nl2br.php. If the one I posted did not work for you, there should be plenty more where it came from.

(Or just use the function from the other Answer that was just posted, I guess that should work, too)

malexmave
  • 1,283
  • 2
  • 17
  • 37
2

The function you're using, nl2br is used for inserting them, but not replacing them. If you want to replace \n with <br /> you just need to use str_replace. Like so:

$string = str_replace("\n","<br />",$string);

There is absolutely no need for regex in this situation.

Navarr
  • 3,703
  • 7
  • 33
  • 57
  • That’s simple `nl2br`, he wants the other way around (`br => \n`). – Mikulas Dite Feb 26 '12 at 16:35
  • On rereading the function description, `nl2br()` doesn't actually "replace", it just "adds `
    ` before all newlines. That explain why the newlines are still there. So I expect str_replace will actually replace them. Since I'm the only one who will be putting these elements in, I'm not sure why I didn't do something like this in the first place.
    – muttley91 Feb 26 '12 at 17:10
  • 1
    Additionally, I normally use something like: `$string = str_replace(array("\r\n","\r","\n"),"
    ",$string);` to cover all the commonly used CRLF combinations.
    – Navarr Feb 26 '12 at 19:45
1

This should fix it:

preg_replace('/<br(\s+)?\/?>(?!\s*\n)/i', "\n", mysql_result($result,$i,"content"))

You cannot simply remove the breaks, because they might be on the same line. This regex will replace all breaks with newline but not those that are followed by the newline.

It will leave the <br>\n in the text. Additional regex will get rid of them:

preg_replace('/<br(\s+)?\/?>/i', "", $res)
Mikulas Dite
  • 7,790
  • 9
  • 59
  • 99