1

Me and my team are doing a React/Redux project and now I want to filter out duplicated tags, but I realize someone has put some tricky strings to the tags data like this

And when I log those tags to the console, for example the first and the second tag of the tag list are looking like the same is "HumanIty" but when I compare them with even strict equal operator, I've got the false result.

When I try to select and copy the text content in both string tags, then paste them back to the console, I got a surprise result - The string in the second tag somehow has spaces between characters (red dots in the picture below)

Someone has to face this problem before please give me some explain about this. Thank you.

luk2302
  • 55,258
  • 23
  • 97
  • 137
LocV's Nest
  • 118
  • 1
  • 7
  • 5
    The actual "tricky strings" should be posted directly here in your question, not as image links. – Pointy Aug 04 '20 at 13:40
  • 3
    You already found the problem, there are special characters in the string. Is your question how to clean those? – luk2302 Aug 04 '20 at 13:41
  • 3
    There are a number of special Unicode characters that don't render as visible marks on the screen. – Pointy Aug 04 '20 at 13:42
  • 3
    https://stackoverflow.com/questions/17978720/invisible-characters-ascii – luk2302 Aug 04 '20 at 13:42
  • Yes, I want to clean out those special characters, but I want to know how Javascript displays both strings to be the same too? – LocV's Nest Aug 04 '20 at 13:44
  • 1
    Perhaps this? https://stackoverflow.com/questions/9364400/remove-not-alphanumeric-characters-from-string But not-alphanumeric chars that you want to keep will be omitted. – marsnebulasoup Aug 04 '20 at 13:44
  • 2
    There are characters that have zero width, they are "invisible", they have no visible representation, if you add them to some already existing the string the appearance of that string will not change since the visible representation did not change. – luk2302 Aug 04 '20 at 13:45
  • Ok, I've got it. Thanks guys very much. – LocV's Nest Aug 04 '20 at 13:46
  • 1
    That would make sense, since the spaces are only shown when editing in console and not when rendered in the console. – marsnebulasoup Aug 04 '20 at 13:47
  • 1
    @LocV'sNest If you've found a solution, you can answer your own question for future reference. – marsnebulasoup Aug 04 '20 at 13:48
  • 1
    See e.g. https://qaz.wtf/u/show.cgi?show=a%E2%80%8Bc&type=string - it looks like I only input `ac` but there is a character between the two. You can input your strings there as well and check the contents. – luk2302 Aug 04 '20 at 13:50
  • 1
    You can always use a regular expression like [/w/s] To remove these characters. /w will match a-zA-Z0-9_ and /s will match various spaces. – imvain2 Aug 04 '20 at 13:50

3 Answers3

2

To answer your question directly:

Is it possible for to two equal strings be unequal in Javascript?

No.

As mentioned in the comments you have some invisible characters in your strings, making them unequal when you compare them.

To fix the problem, remove the invisible characters with a method of your choice (my recommendation would be to not let user input invisible characters in the first place).

Marco
  • 7,007
  • 2
  • 19
  • 49
  • You can't trust any front-end controls and should have backend stripping, as front-end controls are just for UI and can be circumvented if desired. – WBT Aug 04 '20 at 14:05
  • @WBT Yes, of course. I wasn't suggesting a front-end solution. – Marco Aug 04 '20 at 14:07
1

What is the .length property of each string?

If you iterate an index variable over each character position from 0 (inclusive) to length (exclusive), and print the .charCodeAt(index), what do you see?

In doing this, you might see differences between the strings.

WBT
  • 2,249
  • 3
  • 28
  • 40
  • At the first glance, when I look at those look-alike strings, I can't believe that is two different strings so I don't check the length :( – LocV's Nest Aug 04 '20 at 14:12
  • 2
    @LocV'sNest Welcome to debugging. Good strategies include questioning what you see and assume, and examining whether those assumptions are in fact true. When you find one that isn't, that's a big advance towards fixing the bug. – WBT Aug 04 '20 at 14:34
0

I've found out that one of those two look-alike strings contains some special invisible, zero-width character called Byte Order Mark (https://www.ionos.com/digitalguide/websites/web-development/byte-order-mark/)

and we could strip out those characters by the regex /[^\x20-\x7E]/g as (https://www.w3resource.com/javascript-exercises/javascript-string-exercise-32.php)

We could detect the existence of the invisible character with some tools which show unicode character (https://qaz.wtf/u/show.cgi?show=a%E2%80%8Bc&type=string)

LocV's Nest
  • 118
  • 1
  • 7