24

Consider this string:

var s = "A\0Z";

Its length is 3, as given by s.length. Using console.log you can see the string isn't cut and that s[1] is "" and s.charCodeAt(1) is 0.

When you alert it in Firefox, you see AZ. When you alert it in Chrome/Linux using alert(s), the \0 terminates the string and you see A.

My question is: what should browsers and Javascript engines do? Is Chrome buggy here? Is there a document defining what should happen?

As this is a question about standard, a reference is needed.

Denys Séguret
  • 372,613
  • 87
  • 782
  • 758
  • 1
    In the Chrome 23 console I see `AB`. – James Allardice Dec 04 '12 at 08:25
  • @JamesAllardice yes, and if you write console.log(s, s.length, s[0], s[1], s[2]) you see the whole string. The problem in Chrome is when using `alert(s)`. – Denys Séguret Dec 04 '12 at 08:26
  • @dystroy `alert`s are very much OS dependent, there isn't really a way these could be standardised across browsers – Asad Saeeduddin Dec 04 '12 at 08:28
  • 2
    sure they can. by following the standard. \0 is not a recognized escape character :) – Gung Foo Dec 04 '12 at 08:29
  • Do you actually use Chrome on Windows or Linux? Perhaps different *window managers* treat `\0` differently, and you *may* see different results... – Alvin Wong Dec 04 '12 at 08:57
  • Until now I had only tested it on linux but it seems to work as expected on Chrome/Windows. This might just be a bug of Chrom(ium|e)/linux/gnome... – Denys Séguret Dec 04 '12 at 09:00
  • alert('A\0Z') in Chrome now shows 'AZ'. Bug fixed. :) – broofa Sep 18 '13 at 00:37
  • @broofa It's not fixed on Chrome or Chromium on Ubuntu (28.0.1500.71-0ubuntu1.12.04.1). – Denys Séguret Oct 09 '13 at 16:01
  • Chrome has fixed the issue in Chrome 44/Mac – Paul Lan Sep 01 '15 at 06:09
  • Also fixed on Chrome 44 Ubuntu – Denys Séguret Sep 01 '15 at 08:04
  • @PaulLan - Appears to have broken again in Chrome 47 on El Capitan. – JDB Jan 06 '16 at 19:23
  • @JDB I don't reproduce it on Chromium 47.0.2526.73 Ubuntu 15.10 (64-bit) For reference for everybody: [the registered issue](https://code.google.com/p/chromium/issues/detail?id=164126). – Denys Séguret Jan 06 '16 at 19:59
  • @DenysSéguret - Perhaps it's just a Mac issue, but you can watch the video I made (attached to my comment on the issue): https://chromium.googlecode.com/issues/attachment?aid=1641260004000&name=Chrome+164126_720.mov&token=ABZ6GAeiAZ5ovLGQaomgco62M5bQAJkyDA%3A1452114410101&id=164126&mod_ts_token=ABZ6GAeu5_01Be9aRkIn5K2JOvs1TXNDdA%3A1452114410101 – JDB Jan 06 '16 at 21:07

3 Answers3

21

What the browser should do is keep track of the string and its length separately since there are no null terminators present in the standard. (A string is just an object with a length).

What Chrome seems to do (I am taking your word for this) is use the standard C string functions which terminate at a \0. To answer one of your questions: Yes this to me constitutes a bug in Chrome's handling of the alert() function.

Formally the spec says:

A string literal is zero or more characters enclosed in single or double quotes. Each character may be represented by an escape sequence. All characters may appear literally in a string literal except for the closing quote character, backslash, carriage return, line separator, paragraph separator, and line feed. Any character may appear in the form of an escape sequence.

Also:

A string literal stands for a value of the String type. The String value (SV) of the literal is described in terms of character values (CV) contributed by the various parts of the string literal.

And regarding the NUL byte:

The CV [Character Value] of EscapeSequence :: 0 [lookahead ∉ DecimalDigit] is a <NUL> character (Unicode value 0000).

Therefore, a NUL byte should simply be "yet another character value" and have no special meaning, as opposed to other languages where it might end a SV (String value).

For Reference of (valid) "String Single Character Escape Sequences" have a look at the ECMAScript Language spec section 7.8.4. There is a table at the end of the paragraph listing the aforementioned escape sequences.

What someone aiming to write a Javascript engine could probably learn from this: Don't use C/C++ string functions. :)

dda
  • 6,030
  • 2
  • 25
  • 34
Gung Foo
  • 13,392
  • 5
  • 31
  • 39
  • not a relevant comment: "If \ is followed by a decimal number n whose first digit is not 0".. which is not the case here, see the edit to my post – Gung Foo Dec 04 '12 at 08:42
  • Did you read the whole section? It looks like you only read the "Note"...it has "If i is zero, return the EscapeValue consisting of a character (Unicode value 0000)." and "\0 represents the character and cannot be followed by a decimal digit.", let alone other general information about escaping... – Ian Dec 04 '12 at 08:44
  • Do you mean the pseudocode? Yes i read it. The paragraph adds no new information useful to this question. At least none that i can see since the NUL character is already implicitely covered in the string literal section i referenced. Care to enligthen me why i should put it in my answer? – Gung Foo Dec 04 '12 at 08:47
  • OT: Have you reported it as a bug to Chromium or V8? I would like to see it. :) – Alvin Wong Dec 04 '12 at 08:52
  • What I find strange is how could the Chromium team make such a bug. Managing the 0 seems like the most evident problem when calling C functions... That's why I'm a little hesitant to call it a bug. – Denys Séguret Dec 04 '12 at 08:53
  • They took care of it in the console class which means the alert() function probably slipped their mind. I haven't looked at their code but imagine something like `sprintf()`(-based) moving the value from a closure to another before creating the UI element of the alert. also `strncpy` should still break at the `\0`.. probably should use `memcpy` there. – Gung Foo Dec 04 '12 at 08:55
  • 4
    @AlvinWong I entered [issue 164126](http://code.google.com/p/chromium/issues/detail?id=164126) – Denys Séguret Dec 04 '12 at 09:08
  • Note that I'll wait a little (maybe there will be a comment/reason on the issue I entered) before accepting this answer. – Denys Séguret Dec 04 '12 at 10:29
  • There still is no progress on this issue. There is no reason to wait more : I accept this answer. – Denys Séguret Dec 19 '12 at 09:28
  • I don't see anything in the snippets you quoted that gives any indication of how a string value should be displayed as output. All three quotes are about string literals, which is a code syntax matter and pretty much the opposite of any discussion about the `alert()` function. If the Chrome JS engine did not allow strings to contain `\0` _at all_, then that would be another matter, but that is not what is under discussion here. I think this is all moot anyway now, though, because Chrome v34 seems to provide the behavior dystroy is expecting. – JLRishe Apr 16 '14 at 18:07
  • '\000\0\u0000\x00' === '\0\0\0\0' – iegik May 24 '15 at 22:46
  • @DenysSéguret - I've confirmed that the issue is still present in Chrome 47. Added a comment to the Google issue you linked to. – JDB Jan 06 '16 at 19:19
  • This appears to be fixed in Chrome 81. – joerick May 05 '20 at 21:45
8

Javascript treat null character just like any other character, your question is how to display it in cosole or in a alert, it vary in different browsers, no standard about this, so chrome is OK.

dencey
  • 1,041
  • 17
  • 25
  • 3
    I don’t understand why this was downvoted. Though compact, it was the most correct answer before Nelson’s answer, which says the same in much more detail. – Jukka K. Korpela Dec 04 '12 at 09:05
3

You are asking about a non uniform (across browsers) behaviour of alert() method, so it has nothing to do with the Script object and the ECMAscript spec as is, it's about how alert() shows an String object.

alert() is a method of the Window object and ECMAscript does not define it (it only tells the host environment may provide global objects as the window object).

But it happens to be a w3c spec that defines alert() behaviour, unfortunately it's very scarse and doesn't provide any hint about how messages with embedded null characters should be shown.

So this behaviour is, as with any other detail not specified in the spec, left out for the browsers own implementations.

Nelson
  • 49,283
  • 8
  • 68
  • 81
  • 1
    taken that it is defined how a NUL char in a string literal is to be treated, a redefinition would be redundant, no? – Gung Foo Dec 04 '12 at 08:57
  • 3
    The cited “w3c spec” itself says: “This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.” Moreover, it only says that the `alert()` method shall “show the given *message* to the user”. It does not specify how control characters shall be interpreted in it. So stopping at NUL, though bad quality, does not violate any specification. – Jukka K. Korpela Dec 04 '12 at 09:10