0

I have this script:

    var last_build_no = this.getTitle();
    var plain_build_no = "#53 ";
    console.log(last_build_no.length);
    console.log(plain_build_no.length);

And this is the output:

5
4
'#5​3 '
'#53 '

What could be the reason of this difference and how can I convert this strings in same format ?

enter image description here

Because of this difference my test case is failing but the strings I saw looks same:

test.assertEquals(last_build_no, plain_build_no, "Last Build page has expected title");
mirza
  • 5,685
  • 10
  • 43
  • 73
  • Looks like you might have a wide character in there, make sure both are using the same encoding (eg utf8) – Patrick Evans Jun 15 '17 at 22:51
  • I'm actually looking for this. The script file has utf-8 encoding. How can I convert two strings into the same encoding format ? – mirza Jun 15 '17 at 22:53
  • 1
    That first '#53 ' contains a [zero-width space](http://www.fileformat.info/info/unicode/char/200b/index.htm) between the 5 and the 3. Perhaps you could get away with stripping all whitespace? Or you could compare against a string that also has a zero-width space in there. – user94559 Jun 15 '17 at 22:59
  • BINGO! May be you can write an answer for it so I can accept. – mirza Jun 15 '17 at 23:07

1 Answers1

1

The string contains a "zero width space". You can see it if you log the character codes:

last_build_no.split("").forEach(c => console.log(c.charCodeAt(0)));

/* 
  Outputs:
  35
  53
  8203  <-- http://www.fileformat.info/info/unicode/char/200b/index.htm
  51
  32
*/

Unicode has the following zero-width characters:

  • U+200B zero width space
  • U+200C zero width non-joiner Unicode code point
  • U+200D zero width joiner Unicode code point
  • U+FEFF zero width no-break space Unicode code point

You can remove it with a simple regular expression:

var last_build_no = '#5​3 '.replace(/[\u200B-\u200D\uFEFF]/g, '');
console.log(last_build_no.length);  // Output: 4

See this SO answer for more info

Sky
  • 372
  • 2
  • 7