11

From what I understand, unicode characters have various representations.

e.g., code point or hex byte (these two representations are not always the same if UTF-8 encoding is used).

If I want to search for a visible unicode character (e.g., ) I can just copy it and search. This works even if I do not know its underlying unicode representation. But for other characters which may not be easily visible, such as zeros width space, that way does not work well. For these characters, we may want to search it using its code point.

My question

If I have known a character's code point, how do I search it in sublime text using regular expression? I highlight sublime text because different editors may use different format.

jdhao
  • 24,001
  • 18
  • 134
  • 273

2 Answers2

9
  1. Zero width space characters can be found via:

\x{200b}

Demo

  1. Non breaking space characters can be found via:

\xa0

Demo

CinCout
  • 9,486
  • 12
  • 49
  • 67
  • I have seen people using `\x` followed by code point of some characters without curly braces, e.g., [here](https://stackoverflow.com/a/13995175/6064933). But for some characters, the curly braces seem important: without curly braces, we can not find them. What is the rule behind this? – jdhao Dec 13 '17 at 06:32
  • 1
    If the hex representation of the code can fit in 2 letters max, curly braces are optional, else they are required. – CinCout Dec 13 '17 at 06:36
5

For unicode character whose code point is CODE_POINT (code point must be in hexadecimal format), we can safely use regular expression of the format \x{CODE_POINT} to search it.

General rules

For unicode characters whose code points can fit in two hex digits, it is fine to use \x without curly braces, but for those characters whose code points are more than two hex digits, you have to use \x followed by curly braces.

Some examples

For example, in order to find character A, you can use either \x{41} or \x41 to search it.

As another example, in order to find (according to here, its code point is U+6211), you have to use \x{6211} to search it instead of \x6211 (see image below). If you use \x6211, you will not find the character .

enter image description here

Community
  • 1
  • 1
jdhao
  • 24,001
  • 18
  • 134
  • 273
  • I could successfully search `a0` without curly braces. – CinCout Dec 15 '17 at 18:31
  • I mean it is `better` to use curly braces not `must`. If you separate these two situations, you may forget to use curly braces when dealing with unicode chars which have more than 2 byte code point. – jdhao Dec 15 '17 at 18:33
  • Also, this doesn't answer your question. – CinCout Dec 15 '17 at 18:37
  • I have edited my answer. I think it have fully answered my question, i.e., `how to search a unicode character using its code point`. More feedback is welcomed. Actually I answered my own question because editing to your answer is rejected. I hope the downvote is not this reason. – jdhao Dec 15 '17 at 18:46