106

This question has bothered me for a million years... whenever I create a website with a textarea that allows multi-line (such as a "Bio" for a user's profile) I always end up writing the following paranoid code:

// C# code sample...
bio = bio.Replace("\r\n", "\n").Replace("\r", "\n");
bio = Regex.Replace(@"\n{2,}", "\n\n");

So, what do browsers send up for a <textarea name="Bio"></textarea> if it has multiple lines?

Timothy Khouri
  • 31,315
  • 21
  • 88
  • 128
  • Wow, I thought this was a strange question that wasn't going to get any attention... but 16 votes in 1 hour, craziness. – Timothy Khouri Jun 12 '11 at 21:12
  • Thinking about it, I've never come across a problem related to this. If someone enters a newline, it's shown as a newline, in all OS'es, in MySQL clients, in browsers, etc. Looks like this implies that most software has a somewhat consistent take on the matter. Of course, if I'm going to do something important with it, I always normalize. – Halil Özgür Jun 13 '11 at 00:18
  • The problem would come in if I relied on "\r\n", and then was building an "HTML formatted" version of the user's Bio, and since I never run across a "\r\n", I lump it all in one `

    ` tag.

    – Timothy Khouri Jun 13 '11 at 02:48
  • 1
    A lot of things have changed since this was answered - does anyone know if today (the year 2023) this has been standardized across browsers ? – gillyb Jul 27 '23 at 07:36

2 Answers2

52

The HTTP and MIME specs specify that header lines must end with \r\n, but they aren't clear (some would argue that it isn't clear if they are clear) about what to do with the contents of a TEXTAREA. (See, for instance, this thread from an HTML working group about the issue.)

Here's a quote from the HTTP/1.1 spec about message headers:

The line terminator for message-header fields is the sequence CRLF. However, we recommend that applications, when parsing such headers, recognize a single LF as a line terminator and ignore the leading CR.

I think that is a good strategy in general: be strict about what you produce but liberal in what you accept. You should assume that you will receive all sorts of line terminators. (Note that in addition to CRLF and LF, Mac OS-9 used CR alone, and there are still a few of those around. The Unicode standard (section 5.8) specifies a wide range of character sequences that should be recognized as line terminators; there's a list of them here.)

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
  • 6
    I don't believe the specs specify what a textarea produces. – Mark Thomas Jun 12 '11 at 19:59
  • 2
    @Will: Read the original question again. It specifically asks about how browsers encode the content of a `textarea` (which is something that the spec, or at least Ted's quoted section of it, does not constrain). – John Bartholomew Jun 12 '11 at 20:09
  • 2
    @Mark - you are right. There are endless debates about that problem in various forums. (See [this thread from 1995](http://ksi.cpsc.ucalgary.ca/archives/HTML-WG/html-wg-95q1.messages/0035.html) from an HTML working group. – Ted Hopp Jun 12 '11 at 20:10
  • @Ted: Nice link! Perhaps include that in the body of your answer? – John Bartholomew Jun 12 '11 at 20:11
  • @Mark Well, I guess. Anyway, the only sane thing to do is take what you're given and normalize it. – Will Martin Jun 12 '11 at 20:19
  • 2
    This answer needs to be edited. It starts out citing the HTTP spec but that does not pertain to textareas. – DuckMaestro Jun 12 '11 at 20:59
  • @DuckMaestro - did you see the edit from 44 minutes before your comment (and, I assume, your downvote)? – Ted Hopp Jun 12 '11 at 21:13
  • 2
    I did, but the answer still starts out with citing HTTP, which is the wrong spec to emphasize if mention at all. Your included quote specifically addresses "message-header fields" but `textarea`s are not sent as message-header fields. `textarea`s get encoded into the message-body which is different. – DuckMaestro Jun 12 '11 at 22:34
  • Almost all of the RFCed text protocols specify that an application must send CRLF, but should tolerate the other end of the connection sending just LF. Most applications follow this. – Chris S Jun 13 '11 at 01:19
  • @Duck - Edited per your suggestion. I hope the reorganization improves the emphasis. – Ted Hopp Jun 13 '11 at 01:22
  • 1
    @Chris - Please cite an RFC that clearly specifies what an application should send for the **contents** of a TEXTAREA in which a user has entered line breaks. Cite an RFC specifying that a JavaScript submission of a form needs to translate native line breaks into CRLF sequences. One might infer that, but one could also then infer that the same rule should apply to FILE fields, where it clearly does not. – Ted Hopp Jun 13 '11 at 01:39
  • @TedHopp A bit late but [RFC 1867 Section 5.5](https://www.ietf.org/rfc/rfc1867.txt) states _that the client should properly encode the data before sending it back to the server with CRLFs_. Also [W3C](http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.1) says that CRLF have to be used. – Fabian Barney Oct 07 '14 at 13:56
  • @FabianBarney - The important word in RFC 1867 is _should_. In RFC terminology, that is weaker than _shall_ or _must_. The W3C text applies to `application/x-www-form-urlencoded` posts, which it also recommends against. For `multipart/form-data`, it's pretty clear that TEXTAREA data should be transmitted as mime type `text/plain` and [RFC 2046](http://www.rfc-editor.org/rfc/rfc2046.txt) is pretty clear that CRLF must be used. But it's also clear that this is not always observed ([see here](http://serverfault.com/questions/311832/tell-apache-to-convert-lf-to-crlf-for-text-plain), for instance) – Ted Hopp Oct 07 '14 at 18:09
30

what do browsers send up for a <textarea></textarea> if it has multiple lines?

All modern browsers send CRLF (\r\n). However this is not something that has been satisfactorily standardised so I would definitely consider it worthwhile to normalise the newlines of all multi-line input text.

When the value is read through JavaScript rather than being submitted directly from a form, browser behaviour differs. IE and Opera return strings with CRLFs in; Firefox and WebKit return LF. So any form that gets submitted with JavaScript/XMLHttpRequest help is likely to come in either form.

thirtydot
  • 224,678
  • 48
  • 389
  • 349
bobince
  • 528,062
  • 107
  • 651
  • 834
  • Does JavaScript behave consistently on any particular browser across platforms? (For instance, does Firefox return strings with LF on Windows, Macs, and mobile platforms?) – Ted Hopp Jun 13 '11 at 17:16
  • 1
    @Ted: This behaviour is consistent across platforms on Firefox, Opera and WebKit. IE5/Mac I haven't tested, as it's long-dead now, but that browser has many differences to IE5/Win. – bobince Jun 14 '11 at 19:20
  • I'm using Chrome on mac os and I don't see that my inputs are sending '\r\n'.. I only see a '\n'. Has this been standardized ? If a windows user sends my form today, will I receive the CRLF, or just LF ? – gillyb Jul 27 '23 at 07:38