119

Unicode allocated U+4E00..U+9FFF for Chinese characters. This is part of the complete set, but not all.

omg
  • 136,412
  • 142
  • 288
  • 348
  • 5
    I would just link a wikipedia article here as the block range would update from time to time thus it is better to link something dynamically changing ratger tgan giving a static answer... https://en.wikipedia.org/wiki/CJK_Unified_Ideographs – user930067 Jun 20 '15 at 04:18

8 Answers8

124

The definitive list can be found at Unicode Character Code Charts; search the page for "CJK".

The "East Asian Script" document does mention:

Blocks Containing Han Ideographs

Han ideographic characters are found in five main blocks of the Unicode Standard, as shown in Table 18-1

Table 18-1. Blocks Containing Han Ideographs

Block                                   Range       Comment
CJK Unified Ideographs                  4E00-9FFF   Common
CJK Unified Ideographs Extension A      3400-4DBF   Rare
CJK Unified Ideographs Extension B      20000-2A6DF Rare, historic
CJK Unified Ideographs Extension C      2A700–2B73F Rare, historic
CJK Unified Ideographs Extension D      2B740–2B81F Uncommon, some in current use
CJK Unified Ideographs Extension E      2B820–2CEAF Rare, historic
CJK Unified Ideographs Extension F      2CEB0–2EBEF  Rare, historic
CJK Unified Ideographs Extension G      30000–3134F  Rare, historic
CJK Unified Ideographs Extension H      31350–323AF Rare, historic
CJK Compatibility Ideographs            F900-FAFF   Duplicates, unifiable variants, corporate characters
CJK Compatibility Ideographs Supplement 2F800-2FA1F Unifiable variants

Note: this table is current as of Unicode 15.0. The block ranges can evolve over time: latest is in CJK Unified Ideographs.

There are also

CJK Radicals / Kangxi Radicals          2F00–2FDF
CJK Radicals Supplement                 2E80–2EFF

which contain characters which may find their way into regular text, as well as

CJK Symbols and Punctuation             3000–303F

See also Wikipedia:

See also Unihan Database (which organizes information relating to the properties of CJK Unified Ideographs)

Calion
  • 243
  • 2
  • 13
VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • You might also want to include U+AC00 – U+D7AF (Hangul Syllables). – Flimm Apr 01 '13 at 18:07
  • 16
    @Flimm: Hangul is not part of the Chinese standard; Hangul is Korean. Korean language *does* uses Hanja ("Chinese script"), but scarcely and only for some traditional things (like last-names, monuments, places...) which can't be transcribed in Hangul. The OP asked about Chinese specifically, so there was no need for the Responder to include Hangul. :-) – omninonsense Dec 24 '13 at 20:41
  • 1
    List seems to not cover punctuation ("。"). – Michał Woliński May 19 '16 at 13:55
  • 1
    @MichałWoliński [CJK Symbols and Punctuation](http://www.fileformat.info/info/unicode/block/cjk_symbols_and_punctuation/index.htm) range is 3000-303F – Mariano Oct 18 '16 at 10:39
  • I learned that CJK Unified Ideographs Extension A is from 3400 to 4dbf rather than 3400 to 4dff. – Lerner Zhang Dec 15 '16 at 02:11
  • @Lerner Thank you. I have fixed that range, and added wikipedia link for illustrating the complete content. – VonC Dec 15 '16 at 05:36
  • This is not a complete list. For instance, it misses the very common character 门, which appears in the [CJK Radicals Supplement](https://www.unicode.org/charts/PDF/U2E80.pdf). Also, there are now extensions all the way up to H at https://www.unicode.org/charts/. – Calion Oct 30 '22 at 14:05
  • 1
    @Calion Thank you for this feedback. I have included your comment in the answer for more visibility. – VonC Oct 30 '22 at 14:51
  • Also, the range for Extension C is wrong. It's listed as 2A700–2B73F, but it's 2A700–2B739 (https://www.unicode.org/charts/PDF/U2A700.pdf). – Calion Oct 30 '22 at 14:54
  • 1
    @Calion Thank you. Don't hesitate to edit this answer directly if you see any other mistakes. I will validate your edits. – VonC Oct 30 '22 at 14:55
58

Unicode currently has 74605 CJK characters. CJK characters not only includes characters used by Chinese, but also Japanese Kanji, Korean Hanja, and Vietnamese Chu Nom. Some CJK characters are not Chinese characters.

1) 20941 characters from the CJK Unified Ideographs block.

Code points U+4E00 to U+9FCC.

  1. U+4E00 - U+62FF
  2. U+6300 - U+77FF
  3. U+7800 - U+8CFF
  4. U+8D00 - U+9FCC

2) 6582 characters from the CJKUI Ext A block.

Code points U+3400 to U+4DB5. Unicode 3.0 (1999).

3) 42711 characters from the CJKUI Ext B block.

Code points U+20000 to U+2A6D6. Unicode 3.1 (2001).

  1. U+20000 - U+215FF
  2. U+21600 - U+230FF
  3. U+23100 - U+245FF
  4. U+24600 - U+260FF
  5. U+26100 - U+275FF
  6. U+27600 - U+290FF
  7. U+29100 - U+2A6DF

3) 4149 characters from the CJKUI Ext C block.

Code points U+2A700 to U+2B734. Unicode 5.2 (2009).

4) 222 characters from the CJKUI Ext D block.

Code points U+2B740 to U+2B81D. Unicode 6.0 (2010).

5) CJKUI Ext E block.

Coming soon

If the above is not spaghetti enough, take a look at known issues. Have fun =)

Pacerier
  • 86,231
  • 106
  • 366
  • 634
  • 1
    Hi, can you give an example of a CJK ideograph (preferably from the basic plane) that is not a Chinese character? I thought that characters from other languages (Japanese, Korean) which are not also Chinese characters appear in another block (for example the Hangul Jamo block, in the case of Korean)... – Adam Burley Feb 02 '17 at 23:35
  • Try looking at 'Gukja', 'Kokuji', and 'Chữ Nôm'. U+4E44, 乄, is a Japanese-only CJK character. – Ṃųỻịgǻňạcểơửṩ Nov 22 '19 at 15:30
  • It seems that in recent years, CJKUI Ext B block has grown by 9 characters to 42720 with versions 13.0 and 14.0. See https://en.wikipedia.org/wiki/CJK_Unified_Ideographs_Extension_B – Ben Mares Oct 13 '21 at 19:23
31

The exact ranges for Chinese characters (except the extensions) are [\u2E80-\u2FD5\u3190-\u319f\u3400-\u4DBF\u4E00-\u9FCC\uF900-\uFAAD].

  1. [\u2e80-\u2fd5]

CJK Radicals Supplement is a Unicode block containing alternative, often positional, forms of the Kangxi radicals. They are used headers in dictionary indices and other CJK ideograph collections organized by radical-stroke.

  1. [\u3190-\u319f]

Kanbun is a Unicode block containing annotation characters used in Japanese copies of classical Chinese texts, to indicate reading order.

  1. [\u3400-\u4DBF]

CJK Unified Ideographs Extension-A is a Unicode block containing rare Han ideographs.

  1. [\u4E00-\u9FCC]

CJK Unified Ideographs is a Unicode block containing the most common CJK ideographs used in modern Chinese and Japanese.

  1. [\uF900-\uFAAD]

CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain round-trip compatibility between Unicode and those encodings.

For the details please refer to here, and the extensions are provided in other answers.

Lerner Zhang
  • 6,184
  • 2
  • 49
  • 66
  • Could the one who has downvoted this answer please tell me the reason? – Lerner Zhang Feb 27 '17 at 06:52
  • 2
    I didn't downvote, but what about extension B, C, D, and E? – Suragch Feb 27 '17 at 07:35
  • @Suragch Those extensions have been provided correctly in other answers, hence there is no need for me to rewrite it. I only clearly separated the ranges in between. – Lerner Zhang Feb 27 '17 at 08:48
  • 1. range of CJK Radicals Supplement is 2E80—2EFF 2.Kangxi Radicals is not Chinese characters, it's graphical component of a Chinese charaters, it are used specially to express radicals, .e.g. ⼻(U+2F3B) and 彳(U+5F73), ⻜(U+2EDC) and 飞 (U+98DE) 3. If you think kanbun are chinese chars, why not CJK Compatibility Ideographs? Why not Enclosed CJK Letters and Months? – Voyager Mar 13 '19 at 03:12
  • @rambler Thanks for your advice. I think when we process Chinses character we should consider Kangxi Radicals and Kanbun. CJK compatibility ideographs are good but enclosed CJK letters and months are too rare and I don't think we should consider them. – Lerner Zhang Mar 13 '19 at 13:36
15

Unicode version 11.0.0

In Unicode the Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters.

These ranges often contain non-assigned or reserved code points(such as U+2E9A , U+2EF4 - 2EFF),

Chinese characters

bottom  top     reference (also have a look at wiki page)   block name
4E00    9FEF    http://www.unicode.org/charts/PDF/U4E00.pdf CJK Unified Ideographs
3400    4DBF    http://www.unicode.org/charts/PDF/U3400.pdf CJK Unified Ideographs Extension A
20000   2A6DF   http://www.unicode.org/charts/PDF/U20000.pdf    CJK Unified Ideographs Extension B
2A700   2B73F   http://www.unicode.org/charts/PDF/U2A700.pdf    CJK Unified Ideographs Extension C
2B740   2B81F   http://www.unicode.org/charts/PDF/U2B740.pdf    CJK Unified Ideographs Extension D
2B820   2CEAF   http://www.unicode.org/charts/PDF/U2B820.pdf    CJK Unified Ideographs Extension E
2CEB0   2EBEF   https://www.unicode.org/charts/PDF/U2CEB0.pdf   CJK Unified Ideographs Extension F
3007    3007    https://zh.wiktionary.org/wiki/%E3%80%87    in block CJK Symbols and Punctuation
                
  • In CJK Unified Ideographs block, I notice many answers use upper bound 9FCC, but U+9FCD(鿍) is indeed a Chinese char. And all characters in this block are Chinese characters (also used in Japanese or Korean etc.).
  • Most of characters in CJK Unified Ideographs Ext (Except Ext F, only 17% in Ext F are Chinese characters), are traditional Chinese characters, which are rarely used in China.
  • 〇 is the Chinese character form of zero and still in use today

Therefore the range is

[0x3007,0x3007],[0x3400,0x4DBF],[0x4E00,0x9FEF],[0x20000,0x2EBFF]

CJK characters but never used in Chinese

They are Common Han used only for compatibility.

It is almost impossible to see them appear in any Chinese books, articles, writings etc.

All characters here have one corresponding glyph-identical Chinese character, such as 金(U+F90A) and 金(U+91D1), they are identical glyphs.

 F900    FAFF   https://www.unicode.org/charts/PDF/UF900.pdf  CJK Compatibility Ideographs
2F800   2FA1F   https://www.unicode.org/charts/PDF/U2F800.pdf CJK Compatibility Ideographs Supplement

CJK related symbols

2E80    2EFF    http://www.unicode.org/charts/PDF/U2E80.pdf CJK Radicals Supplement
            
2F00    2FDF    http://www.unicode.org/charts/PDF/U2F00.pdf Kangxi Radicals 
2FF0    2FFF    https://unicode.org/charts/PDF/U2FF0.pdf    Ideographic Description Character
3000    303F    https://www.unicode.org/charts/PDF/U3000.pdf    CJK Symbols and Punctuation
3100    312f    https://unicode.org/charts/PDF/U3100.pdf    Bopomofo
31A0    31BF    https://unicode.org/charts/PDF/U31A0.pdf    Bopomofo Extended
31C0    31EF    http://www.unicode.org/charts/PDF/U31C0.pdf CJK Strokes
3200    32FF    https://unicode.org/charts/PDF/U3200.pdf    Enclosed CJK Letters and Months
3300    33FF    https://unicode.org/charts/PDF/U3300.pdf    CJK Compatibility
FE30    FE4F    https://www.unicode.org/charts/PDF/UFE30.pdf    CJK Compatibility Forms
FF00    FFEF    https://www.unicode.org/charts/PDF/UFF00.pdf    Halfwidth and Fullwidth Forms
1F200   1F2FF   https://www.unicode.org/charts/PDF/U1F200.pdf   Enclosed Ideographic Supplement
  • some blocks such as Hangul Compatibility Jamo are excluded because of no relation to Chinese.
  • Kangxi Radicals is not Chinese characters, they are graphical components of Chinese characters, used specially to express radicals, .e.g. ⼻(U+2F3B) and 彳(U+5F73), ⻜(U+2EDC) and 飞 (U+98DE)

Other common punctuation appearing in Chinese

This is a wide range, some punctuation may be never used, some punctuations such as ……”“ are used so much in Chinese.

0000    007F    https://unicode.org/charts/PDF/U0000.pdf    C0 Controls and Basic Latin 
2000    206F    https://unicode.org/charts/PDF/U2000.pdf    General Punctuation
……

There are also many Chinese-related symbols, such as Yijing Hexagram Symbols or Kanbun, but it's off-topic anyway. I write non-chinese-characters in CJK to have a better explanation of what Chinese characters are. And the ranges above already cover almost all the characters which appear in Chinese writing except math and other specialty notation.

Supplementary

CJK Symbols and Punctuation

 、。〃〄々〆〇〈〉《》「」『』【】〒〓〔〕〖〗〘〙〚〛〜〝〞〟〠〡〢〣〤〥〦〧〨〩〪〭〮〯〫〬〰〱〲〳〴〵〶〷〸〹〺〻〼〽 〾 〿

Halfwidth and Fullwidth Forms

!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~⦅⦆。「」、・ヲァィゥェォャュョッーアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワン゙゚ᄀᄁᆪᄂᆬᆭᄃᄄᄅᆰᆱᆲᆳᆴᆵᄚᄆᄇᄈᄡᄉᄊᄋᄌᄍᄎᄏᄐᄑ하ᅢᅣᅤᅥᅦᅧᅨᅩᅪᅫᅬᅭᅮᅯᅰᅱᅲᅳᅴᅵ¢£¬ ̄¦¥₩│←↑→↓■○

Refer

  1. https://zh.wikipedia.org/wiki/%E6%B1%89%E5%AD%97 (in chinese language, notice the right side bar)
  2. https://zh.wikipedia.org/wiki/%E4%B8%AD%E6%97%A5%E9%9F%93%E7%9B%B8%E5%AE%B9%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97 (notice the bottom table)
  3. http://www.unicode.org
tripleee
  • 175,061
  • 34
  • 275
  • 318
Voyager
  • 727
  • 8
  • 26
3

The Unicode code blocks that the others answers gave certainly cover most of the Chinese Unicode characters, but check out some of these other code blocks, too.

CJK_UNIFIED_IDEOGRAPHS
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_C
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_D
CJK_UNIFIED_IDEOGRAPHS_EXTENSION_E
CJK_COMPATIBILITY
CJK_COMPATIBILITY_FORMS
CJK_COMPATIBILITY_IDEOGRAPHS
CJK_COMPATIBILITY_IDEOGRAPHS_SUPPLEMENT
CJK_RADICALS_SUPPLEMENT
CJK_STROKES
CJK_SYMBOLS_AND_PUNCTUATION
ENCLOSED_CJK_LETTERS_AND_MONTHS
ENCLOSED_IDEOGRAPHIC_SUPPLEMENT
KANGXI_RADICALS
IDEOGRAPHIC_DESCRIPTION_CHARACTERS

See my fuller discussion here. And this site is convenient for browsing Unicode.

Community
  • 1
  • 1
Suragch
  • 484,302
  • 314
  • 1,365
  • 1,393
3

Unicode continually evolves, with the current goal to have "A new major version of the standard will be released each year. Starting with Unicode 14.0, each of those releases is targeted for the third quarter of each year."

Without a single community wiki that someone regularly updates, if you want to maintain coverage for corrections and additional extensions, to stay up-to-date be sure to also double check the latest standard, always found at: https://www.unicode.org/versions/latest/ And look for the East Asia chapter (unless that one day gets split as well).

As of this initial writing, the latest is v14, and Ch 18 "presents scripts used in East Asia. This includes major writing systems associated with Chinese, Japanese, and Korean. It also includes several scripts for minority languages". The first table reviews Blocks Containing Han Ideographs where we see they've gone up to Extension G:

Block                                   Range       Comment
-----------------------------------------------------------
CJK Unified Ideographs                  4E00–9FFF   Common
CJK Unified Ideographs Extension A      3400–4DBF   Rare
CJK Unified Ideographs Extension B      20000–2A6DF Rare, historic
CJK Unified Ideographs Extension C      2A700–2B73F Rare, historic
CJK Unified Ideographs Extension D      2B740–2B81F Uncommon, some in current use
CJK Unified Ideographs Extension E      2B820–2CEAF Rare, historic
CJK Unified Ideographs Extension F      2CEB0–2EBEF Rare, historic
CJK Unified Ideographs Extension G      30000–3134F Rare, historic
CJK Compatibility Ideographs            F900–FAFF   Duplicates, unifiable variants, corporate characters
CJK Compatibility Ideographs Supplement 2F800–2FA1F Unifiable variants

The second table Small Extensions to CJK Blocks notes additions: "The repertoire in the CJK Unified Ideographs block has subsequently been extended with small sets of unified ideographs or ideographic components needed for interoperability with various standards, or for other reasons, as shown in Table 18-2", some of which "have involved reserved ranges at the end of other CJK blocks."

For additional related blocks such as punctuation and other syllabaries (including for J+K) which should be more stable, check out that unicode chapter further as well as other answers around here, and https://en.wikipedia.org/wiki/Han_unification#Unicode_ranges. https://blog.miniasp.com/post/2019/01/02/Common-Regex-patterns-for-Unicode-characters has some interesting discussion as well even though it was written in 2019.

For fonts that try to render these, see https://en.wikipedia.org/wiki/List_of_CJK_fonts, but note that coverage information is sparse. You'll have to dig around to see those details, e.g. Adobe/Google's Source Han/Noto fonts don't cover all extensions or compatibility ideographs.

qix
  • 7,228
  • 1
  • 55
  • 65
1

To summarize, it sounds like these are them:

var blocks = [
  [0x3400, 0x4DB5],
  [0x4E00, 0x62FF],
  [0x6300, 0x77FF],
  [0x7800, 0x8CFF],
  [0x8D00, 0x9FCC],
  [0x2e80, 0x2fd5],
  [0x3190, 0x319f],
  [0x3400, 0x4DBF],
  [0x4E00, 0x9FCC],
  [0xF900, 0xFAAD],
  [0x20000, 0x215FF],
  [0x21600, 0x230FF],
  [0x23100, 0x245FF],
  [0x24600, 0x260FF],
  [0x26100, 0x275FF],
  [0x27600, 0x290FF],
  [0x29100, 0x2A6DF],
  [0x2A700, 0x2B734],
  [0x2B740, 0x2B81D]
]
Lance
  • 75,200
  • 93
  • 289
  • 503
0

The short answer

Any character in the block containing the text "CJK Unified Ideographs", as well as blocks containing the term "CJK", "Kangxi", "Bopomofo", "Fullwidth", "Yijing" suggests they encode Chinese characters. There are over 90000 such ideographs excluding regional variants.

The Unicode ranges that seem specifically for CJK text that can be used for Chinese text are:

  • 2E80 .. 4DFF (partially; punctuation, supplementary, one unused block of 16)
    • 2E80 .. 303F (punctuation, reference)
    • 3100 .. 312F, 31A0 .. 31B0 (bopomofo)
    • 31C0 .. 31EF (strokes)
  • 3400 .. 4DFF (extension A and Yijing hexagrams)
  • 4E00 .. 9FFF (commonly used Chinese and/or Japanese characters)
  • A few blocks from the Fxxx range (mostly compatibility use only)
  • 1F260 .. 1F265 (Chinese folk religion symbols)
  • 20000 .. 2FFFD (All of SIP)
  • 30000 .. 3FFFD (All of TIP)

The long answer

The Chinese characters are a subset the set of Han characters used in Chinese, Japanese, and Korean collectively referred to as CJK. The thing is, not all characters in your range are actually seen in Chinese use; a few are exclusively for Japanese use. Han unification has also fraught with controversy. Also, Vietnamese formerly used the Han script, including a large number of locally created characters called Chữ Nôm.

Moreover, the Chinese characters are actually termed Unified CJK Ideographs in Unicode. The term "unified" means that two characters otherwise considered identical in meaning, might be written slightly differently in different regions. This is most noticeable with Chinese text agains Japanese shinjitai.

The set of Chinese / CJK characters is constantly growing and now already around 90000. New blocks of CJK characters are being created, and a few additional characters are filled into the remaining spaces of CJK blocks that haven't been filled.

The current list

I have compiled the list of all Unicode blocks that are specifically for CJK use containing characters that are suitable for Chinese text, even though most are a mix between characters used exclusively for Chinese and others exclusively for Japanese or Korean. I have added these notes based on my own observations, please edit if you have any questions or points of improvement.

A large portion of Unicode encodings are specifically for CJK text, though Unicode did not separate characters exclusive to Chinese apart from characters exclusive to Japanese, &c. Private use characters are also commonly used for encoding Chinese characters.

The folllowing list lists all the blocks that are for Chinese text. CJK blocks never used for Chinese text, such as kana and jamo are excluded. They are organized by usage first, so the code points may seem out of order. I cannot give a comprehensive list of all CJK characters that "are Chinese" or "are Japanese" or so on.

The Ideographs

These ideographs are unified, and are abstractly considered the same character across all three (or four) languages besides being simplified/traditional, or casual/financial. Exceptions may occur due to source separation.

Block Name Range Count Release Notes (tofu warning!)
CJK Unified Ideographs 4E00..9FFF 20992 1.0.1 ~ 14.0 Contains all the commonly used characters. All Jōyō characters are featured there. Full. Example: 汉,漢,中
CJK Compatibility Ideographs FE30..FE4F 12 1.0.1 Block contains 472 characters in total. 12 of them are rarely used kokuji otherwise among duplicates. Example: 﨏, 﨤, 﨨
CJK Unified Ideographs Extension A 3400..4DBF 6592 3.0 ~ 13.0 Contains rarely used characters in the BMP. Full. Example: 㐂, 㝉, 䶹
CJK Unified Ideographs Extension B 20000..2A6DF 42720 3.1 ~ 14.0 Contains rarely used characters in the SIP, including the remainder of Kangxi characters. Example: , ,
CJK Unified Ideographs Extension C 2A700..2B73F 4154 5.2 ~ 15.0 Example:
CJK Unified Ideographs Extension D 2B740..2B81F 222 6.0 Example:
CJK Unified Ideographs Extension E 2B820..2CEAF 5762 8.0 Example: , ,
CJK Unified Ideographs Extension F 2CEB0..2EBEF 7473 10.0 Example:
CJK Unified Ideographs Extension G 30000..3134F 4939 13.0 Contains the biang ideograph. Example: , , ,
CJK Unified Ideographs Extension H 31350..323AF 4192 15.0 Example: , ,

As of now, there are 97058 unique CJK characters, including CJK characters never used in Chinese text, but excluding regional conventions.

An upcoming ninth extension of the set of unified ideographs is underway for Version 16.0, containing 622 characters from 2EBF0..2EE5D.

Other CJK characters

These are analogous to non-letter characters in English text. Certain characters, such as punctuation, are specifically for Chinese or CJK use.

Block Range Usage
CJK Symbols and Punctuation 3000..303F Standard forms of punctuation. The ideograph for "zero", 「〇」, is there. Not all are seen in Chinese text, such as 々.
CJK Strokes 31C0..31EF Intended for reference or educational use.
Kangxi Radicals 2F00..2FDF The 224 Kangxi Radicals in their original forms. For dictionary use.
CJK Radicals Supplement 2E80..2EFF Simplified, alternative, and positional forms of radicals
CJK Strokes 31C0..31EF For dictionary, educational, or reference use
Ideographic Description Characters 2FF0..2FFF Used to describe certain CJK characters by component. Can also represent an ideograph that is unencoded.
Bopomofo 3100..312F For pronunciation guide.
Bopomofo Extended 31A0..31BF For phonetically representing other varieties of spoken Chinese.
Halfwidth and Fullwidth Forms FF00..FFEF Alternative forms of Latin, kana, jamo, and symbols to fit in a CJK character or half of a CJK character. Still commonly used in Chinese and Japanese.

Supplementary characters

Several other blocks are reserved for special or stylistic usage, as well as for compatibility reasons. Many are not used in Chinese text.

Block Range Usage
Enclosed CJK Letters and Months 3200..32FF Contains measurement units treated as an individual CJK character. Mostly intended for Korean or Japanese usage.
Enclosed Ideographic Supplement 1F200..1F2FF Except for the Chinese folk religion symbols (, etc.), mostly intended for Japanese use. Many also have emoji forms.
CJK Compatibility 3300..33FF Contains commonly used precomposed characters arranged in a specific way to fit in a CJK square. Mostly intended for Korean or Japanese usage.
Yijing Hexagram Symbols 4DC0..4DFF Features the 64 hexagrams from I Ching, an ancient Chinese divination text.

Compatibility characters

Unicode has encoded additional Chinese and other CJK characters specifically for compatibility use only. These blocks are encoded at the end of their respective planes. Their effects can be emulated through other means such as language tags or CSS classes. They are rarely seen in web pages.

Block Range Usage
CJK Compatibility Ideographs F900..FAFF Ensures round-trip compatibility for various legacy encodings that may encode a particular ideograph in multiple locations. There are 460 such duplicates as of Version 15.0. Most are from Korean and Japanese encodings. The one Chinese encoding is Big5, which encodes 兀 as 兀 (U+5140) and 兀 (U+FA0C). 12 are actually unified characters.
CJK Compatibility Ideographs Supplement 2F800..2FA1F Ensures round-trip compatibility for the CNS 11643-1992 encoding. Many are variant characters otherwise unified by the unification rule. 542 characters are encoded, 2 reserved, none are unified characters. (U+2F803) is unified with 你 (U+4F60).
CJK Compatibility Forms FE30..FE4F For compatibility with CNS 11643. Specifically for vertical text.
Vertical Forms FE10..FE1F For compatibility with GB 18030. Specifically for vertical text.
Small Form Variants FE50..FE6F For compatibility with CNS 11643. They are usually rendered fullwidth.

Chinese or Japanese?

Several characters are exclusive to Japanese, many include kokuji and shinjitai forms. An example is the character for "awareness", 覺. The character 覺 is traditional, while 觉 is simplified, and 覚 is shinjitai only found in Japanese text.

Additionally, characters that are unified in Unicode might actually be written slightly differently in various locales; this is particularly notable between Chinese and Japanese text. Japanese text are often set to a different font face than Chinese text. In fact, some Japanese characters, such as 漢, are written differently in shinjitai and traditional yet are unified.

Chinese Japanese Inherited

The top one is commonly seen in both simplified and traditional Chinese text. The bottom one "Korean" is seen in traditional printed Chinese text. The middle one are characteristic to Japanese kanji.