6

I was trying to debug a problem of searching inside a string and it came down to the following interesting piece of code.

Both "item " and "item " seem equal but they are not!

var result = ("item " === "item ");

document.write(result);
console.log(result);

After investigating this further by pasting it on a Python interpreter, I found out that the first "item " has a different kind of space as "item\xc2\xa0". Which I think is a non breaking space.

Now, A possible solution to match these strings will be to replace \xc2\xa0 with space, but is there a better approach to convert all special space characters with normal space?.

Irshad P I
  • 2,394
  • 2
  • 14
  • 23
  • Check `"item ".charCodeAt(4)` of both strings. They are different – adiga Jan 09 '20 at 07:47
  • Yes, I have done that, and the characters are different as mentioned in the question. My question is on how to approach with comparing these strings? (without trimming/removing spaces) – Irshad P I Jan 09 '20 at 07:54
  • I was just mentioning how to identity it without using python interpreter. – adiga Jan 09 '20 at 07:57
  • You might want to take a look at the possible space characters in Unicode here: https://unicode-table.com/en/search/?q=space there are also some language specific space characters. So this can get really tricky if you are perfectionistic. – Krisztián Balla Jan 09 '20 at 08:10
  • @JennyO'Reilly, Haha, thank you. I will definitely check that out. FYI, my perfectionistic parameter is at 90 percent. – Irshad P I Jan 09 '20 at 10:03

3 Answers3

7

In ES2015/ES6 you can use the String.Prototype.normalize() method to decompose both characters to the same simple space character:

const normalize = str => str.normalize('NFKD');
console.log(normalize("item\u0020") === normalize("item\u00a0"));
Kaiido
  • 123,334
  • 13
  • 219
  • 285
5

The space in the first string is character code 160 (a non-breaking space), and the space in the second string is character code 32 (a normal space), so the strings aren't equal to each other.

console.log("item ".charCodeAt(4), "item ".charCodeAt(4));

is there a better approach to convert all special space characters with normal space?.

You can match space characters which aren't tabs or newlines and replace with a normal space:

const makeSpacesNormal = str => str.replace(/(?=\s)[^\r\n\t]/g, ' ');
console.log(makeSpacesNormal("item ") === makeSpacesNormal("item "));

Specifically, the \s will match a whole bunch of space-like characters:

[\t\n\v\f\r \u00a0\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u200b\u2028\u2029\u3000]

and by matching and replacing those (except newlines and tabs, if you want), you'll be left with ordinary spaces.

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320
  • On regex101.com it says that: **\s** matches any whitespace character (equal to `[\r\n\t\f\v ]`. Which is wrong then? – Krisztián Balla Jan 09 '20 at 08:05
  • 2
    You can see a description in the official specification [here](https://tc39.es/ecma262/#sec-line-terminators) (scroll up a bit, and also scroll down - see the `WhiteSpace` and the `Line Terminator` table). I think Regex101 tried to simplify things for all languages rather than go into detail on the mechanics for each, not sure how precise it is for other languages – CertainPerformance Jan 09 '20 at 08:14
  • I will accept this answer because 1. Works cross-browser, 2. Is extensible, 3. Well detailed. – Irshad P I Jan 09 '20 at 10:11
0

trim will remove all whitespace from the beginning and end of a string. If you want to compare two strings while ignoring leading and trailing whitespace, trim both of them.

"item ".trim() === "item ".trim()
Schwern
  • 153,029
  • 25
  • 195
  • 336