3

I am trying to parse a website, and I am trying to replace all occurrence of " " in a string. This doesn't seem to be space or tab, what is this?

a more general question: how do you search for the name of some char you don't know? I tried ansi and utf-8 page with not result.

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
muyueh
  • 1,018
  • 13
  • 16

2 Answers2

3

That's an ideographic space. Read more about it here: http://www.fileformat.info/info/unicode/char/3000/index.htm.

user3822146
  • 114
  • 3
  • 10
3

It is character code 12288, a/k/a an ideographic space for use in, for example, many Asian languages. You can check this with this code:

alert( " ".charCodeAt(0) );

More info here.

Edit: You can match this with the regex \s. For example, this converts all of those characters to a single, regular space (character 32):

"foo bar baz".replace(/\s/g, ' '); // produces foo bar baz

To replace this character but leave alone "normal" spaces (character 32, tab, new line, carriage return), you might try this:

"foo bar baz\tblah\tblah\nblah".replace(/(?![ \t\r\n])\s/g, ' ')
elixenide
  • 44,308
  • 16
  • 74
  • 100