Same output but different character length

Question

I have this script:

    var last_build_no = this.getTitle();
    var plain_build_no = "#53 ";
    console.log(last_build_no.length);
    console.log(plain_build_no.length);

And this is the output:

5
4
'#53 '
'#53 '

What could be the reason of this difference and how can I convert this strings in same format ?

Because of this difference my test case is failing but the strings I saw looks same:

test.assertEquals(last_build_no, plain_build_no, "Last Build page has expected title");

Looks like you might have a wide character in there, make sure both are using the same encoding (eg utf8) — Patrick Evans, Jun 15 '17 at 22:51
I'm actually looking for this. The script file has utf-8 encoding. How can I convert two strings into the same encoding format ? — mirza, Jun 15 '17 at 22:53
That first '#53 ' contains a [zero-width space](http://www.fileformat.info/info/unicode/char/200b/index.htm) between the 5 and the 3. Perhaps you could get away with stripping all whitespace? Or you could compare against a string that also has a zero-width space in there. — user94559, Jun 15 '17 at 22:59
BINGO! May be you can write an answer for it so I can accept. — mirza, Jun 15 '17 at 23:07

score 1 · Accepted Answer · answered Jun 15 '17 at 23:07

The string contains a "zero width space". You can see it if you log the character codes:

last_build_no.split("").forEach(c => console.log(c.charCodeAt(0)));

/* 
  Outputs:
  35
  53
  8203  <-- http://www.fileformat.info/info/unicode/char/200b/index.htm
  51
  32
*/

Unicode has the following zero-width characters:

U+200B zero width space
U+200C zero width non-joiner Unicode code point
U+200D zero width joiner Unicode code point
U+FEFF zero width no-break space Unicode code point

You can remove it with a simple regular expression:

var last_build_no = '#53 '.replace(/[\u200B-\u200D\uFEFF]/g, '');
console.log(last_build_no.length);  // Output: 4

See this SO answer for more info

Same output but different character length

1 Answers1