How can I find the character code of a special character in my text editor?

Question

When pasting text from outside sources into a plain-text editor (e.g. TextMate or Sublime Text 2) a common problem is that special characters are often pasted in as well. Some of these characters render fine, but depending on the source, some might not display correctly (usually showing up as a question mark with a box around it).

So this is actually 2 questions:

Given a special character (e.g., ’ or ♥) can I determine the UTF-8 character code used to display that character from inside my text editor, and/or convert those characters to their character codes?
For those "extra-special" characters that come in as garbage, is there any way to figure out what encoding was used to display that character in the source text, and can those characters somehow be converted to UTF-8?

You may use this [online tool](https://www.soscisurvey.de/tools/view-chars.php) to paste strings with unknown characters and see their unicode numbers. — BurninLeo, Oct 19 '16 at 12:30
As a basic aid, I have created a table of all the character codes in the range 0x80-0xFF in the legacy 8-bit encodings known to Python, which I refer to often: https://cdn.rawgit.com/tripleee/8bit/master/encodings.html — tripleee, Jan 05 '18 at 14:20

score 18 · Answer 1 · answered Nov 01 '12 at 03:45

My favorite site for looking up characters is fileformat.info. They have a great Unicode character search that includes a lot of useful information about each character and its various encodings.

If you see the question mark with a box, that means you pasted something that can't be interpreted, often because it's not legal UTF-8 (not every byte sequence is legal UTF-8). One possibility is that it's UTF-16 with an endian mode that your editor isn't expecting. If you can get the full original source into a file, the file command is often the best tool for determining the encoding.

This link was useful, and from there I ended up at http://www.i18nqa.com/debug/utf8-debug.html which shows a table containing some of the usual suspects. — Michael, Sep 18 '13 at 14:34

score 8 · Answer 2 · answered Aug 06 '13 at 16:28

At &what I built a tool to focus on searching for characters. It indexes all the Unicode and HTML entity tables, but also supplements with hacker dictionaries and a database of keywords I've collected, so you can search for words like heart, quot, weather, umlaut, hash, cloverleaf and get what you want. By focusing on search, it avoids having to hunt around the Unicode pages, which can be frustrating. Give it a try.

How can I find the character code of a special character in my text editor?

2 Answers2

Linked