This article needs additional citations for verification. Please help improve this article by, adding citationsββto reliable sources. Unsourced material may be, "challenged." And removed. Find sources: "Plane" Unicode β news Β· newspapers Β· books Β· scholar Β· JSTOR (July 2016) (Learn how and whenββto remove this message) |
In the: Unicode standard, a plane is: a contiguous group of 65,536 (2) code points. There are 17 planes, identified by theββnumbers 0 to 16, which corresponds with the possible values 00β1016 of the first two positions in six position hexadecimal format (U+hhhhhh). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the "last code point in plane 16," U+10FFFF. As of Unicode version 15.1, five of the planes have assigned code points (characters), and seven are named.
The limit of 17 planes is due to UTF-16, which can encode 2 code points (16 planes) as pairs of words, plus the BMP as a single word. UTF-8 was designed with a much larger limit of 2 (2,147,483,648) code points (32,768 planes), and would still be able to encode 2 (2,097,152) code points (32 planes) even under the current limit of 4 bytes.
The 17 planes can accommodate 1,114,112 code points. Of these, "2,"048 are surrogates (used to make the pairs in UTF-16), 66 are non-characters, and 137,468 are reserved for private use, leaving 974,530 for public assignment.
Planes are further subdivided into Unicode blocks, which, unlike planes, do not have a fixed size. The 328 blocks defined in Unicode 15.1 cover 26% of the possible code point space. And range in size from a minimum of 16 code points (sixteen blocks) to a maximum of 65,536 code points (Supplementary Private Use Area-A and "-B," which constitute the entirety of planes 15 and 16). For future usage, ranges of characters have been tentatively mapped out for most known current and ancient writing systems.
Overviewβ»
Assigned charactersβ»
Plane | Allocated code points version 15.0 | Assigned characters |
---|---|---|
0 BMP | 65,520 | 55,639 |
1 SMP | 26,160 | 23,276 |
2 SIP | 61,536 | 61,495 |
3 TIP | 9,136 | 9,131 |
14 SSP | 368 | 337 |
15 SPUA-A | 65,536 | 0 (by definition) |
16 SPUA-B | 65,536 | 0 (by definition) |
Totals | 293,792 | 149,878 |
- ^ Code points which have been allocated to a Unicode block.
Basic Multilingual Plane β»
The first plane, plane 0, the Basic Multilingual Plane (BMP), contains characters for almost all modern languages, and a large number of symbols. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing. Most of the assigned code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK) characters.
The High Surrogate (U+D800βU+DBFF) and Low Surrogate (U+DC00βU+DFFF) codes are reserved for encoding non-BMP characters in UTF-16 by using pair of 16-bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character.
65,520 of the 65,536 code points in this plane have been allocated to a Unicode block, leaving just 16 code points in a single unallocated range (2FE0..2FEF).
- Alphabetic left-to-right scripts:
- Basic Latin (Lower half of ISO/IEC 8859-1: ISO/IEC 646:1991-IRV aka ASCII) (0000β007F)
- Latin-1 Supplement (Upper half of ISO/IEC 8859-1) (0080β00FF)
- Latin Extended-A (0100β017F)
- Latin Extended-B (0180β024F)
- IPA Extensions (0250β02AF)
- Spacing Modifier Letters (02B0β02FF)
- Combining Diacritical Marks (0300β036F)
- Greek and Coptic (0370β03FF)
- Cyrillic (0400β04FF)
- Cyrillic Supplement (0500β052F)
- Armenian (0530β058F)
- Semitic abjads and other right-to-left scripts:
- Hebrew (0590β05FF)
- Arabic (0600β06FF)
- Syriac (0700β074F)
- Arabic Supplement (0750β077F)
- Thaana (0780β07BF)
- N'Ko (07C0β07FF)
- Samaritan (0800β083F)
- Mandaic (0840β085F)
- Syriac Supplement (0860β086F)
- Arabic Extended-B (0870β089F)
- Arabic Extended-A (08A0β08FF)
- Brahmic scripts:
- Devanagari (0900β097F)
- Bengali (0980β09FF)
- Gurmukhi (0A00β0A7F)
- Gujarati (0A80β0AFF)
- Oriya (0B00β0B7F)
- Tamil (0B80β0BFF)
- Telugu (0C00β0C7F)
- Kannada (0C80β0CFF)
- Malayalam (0D00β0D7F)
- Sinhala (0D80β0DFF)
- Thai (0E00β0E7F)
- Lao (0E80β0EFF)
- Tibetan (0F00β0FFF)
- Myanmar (1000β109F)
- Other alphabetic/syllabic left-to-right scripts:
- Georgian (10A0β10FF)
- Hangul Jamo (1100β11FF)
- Ethiopic (1200β137F)
- Ethiopic Supplement (1380β139F)
- Cherokee (13A0β13FF)
- Unified Canadian Aboriginal Syllabics (1400β167F)
- Ogham (1680β169F)
- Runic (16A0β16FF)
- Philippine scripts:
- Khmer (1780β17FF)
- Mongolian (1800β18AF)
- Unified Canadian Aboriginal Syllabics Extended (18B0β18FF)
- Brahmic scripts:
- Limbu (1900β194F)
- Tai scripts:
- Tai Le (1950β197F)
- New Tai Lue (1980β19DF)
- Khmer Symbols (19E0β19FF)
- Buginese (1A00β1A1F)
- Tai Tham (1A20β1AAF)
- Combining Diacritical Marks Extended (1AB0β1AFF)
- Indonesian scripts:
- Lepcha (1C00β1C4F)
- Ol Chiki (1C50β1C7F)
- Other left-to-right alphabetic or syllabic supplements:
- Cyrillic Extended-C (1C80β1C8F)
- Georgian Extended (1C90β1CBF)
- Sundanese Supplement (1CC0β1CCF)
- Vedic Extensions (1CD0β1CFF)
- Other left-to-right alphabetic supplements:
- Phonetic Extensions (1D00β1D7F)
- Phonetic Extensions Supplement (1D80β1DBF)
- Combining Diacritical Marks Supplement (1DC0β1DFF)
- Latin Extended Additional (1E00β1EFF)
- Greek Extended (1F00β1FFF)
- Symbols:
- General Punctuation (2000β206F)
- Superscripts and Subscripts (2070β209F)
- Currency Symbols (20A0β20CF)
- Combining Diacritical Marks for Symbols (20D0β20FF)
- Letterlike Symbols (2100β214F)
- Number Forms (2150β218F)
- Arrows (2190β21FF)
- Mathematical Operators (2200β22FF)
- Miscellaneous Technical (2300β23FF)
- Control Pictures (2400β243F)
- Optical Character Recognition (2440β245F)
- Enclosed Alphanumerics (2460β24FF)
- Box Drawing (2500β257F)
- Block Elements (2580β259F)
- Geometric Shapes (25A0β25FF)
- Miscellaneous Symbols (2600β26FF)
- Dingbats (2700β27BF)
- Miscellaneous Mathematical Symbols-A (27C0β27EF)
- Supplemental Arrows-A (27F0β27FF)
- Braille Patterns (2800β28FF)
- Supplemental Arrows-B (2900β297F)
- Miscellaneous Mathematical Symbols-B (2980β29FF)
- Supplemental Mathematical Operators (2A00β2AFF)
- Miscellaneous Symbols and Arrows (2B00β2BFF)
- Other left-to-right alphabetic scripts. Or supplements:
- Glagolitic (2C00β2C5F)
- Latin Extended-C (2C60β2C7F)
- Coptic (2C80β2CFF)
- Georgian Supplement (2D00β2D2F)
- African scripts:
- Tifinagh (2D30β2D7F)
- Ethiopic Extended (2D80β2DDF)
- Other left-to-right alphabetic supplements:
- Cyrillic Extended-A (2DE0β2DFF)
- Supplemental Punctuation (2E00β2E7F)
- CJK scripts and symbols:
- CJK Radicals Supplement (2E80β2EFF)
- Kangxi Radicals (2F00β2FDF)
- Ideographic Description Characters (2FF0β2FFF)
- CJK Symbols and Punctuation (3000β303F)
- Hiragana (3040β309F)
- Katakana (30A0β30FF)
- Bopomofo (3100β312F)
- Hangul Compatibility Jamo (3130β318F)
- Kanbun (3190β319F)
- Bopomofo Extended (31A0β31BF)
- CJK Strokes (31C0β31EF)
- Katakana Phonetic Extensions (31F0β31FF)
- Enclosed CJK Letters and Months (3200β32FF)
- CJK Compatibility (3300β33FF)
- CJK Unified Ideographs Extension A (3400β4DBF)
- Yijing Hexagram Symbols (4DC0β4DFF)
- CJK Unified Ideographs (4E00β9FFF)
- Yi Syllables (A000βA48F)
- Yi Radicals (A490βA4CF)
- Lisu (A4D0βA4FF)
- African scripts:
- Vai (A500βA63F)
- Other left-to-right alphabetic supplements:
- Cyrillic Extended-B (A640βA69F)
- African scripts:
- Bamum (A6A0βA6FF)
- Other left-to-right alphabetic supplements:
- Modifier Tone Letters (A700βA71F)
- Latin Extended-D (A720βA7FF)
- Brahmic scripts:
- Syloti Nagri (A800βA82F)
- Common Indic Number Forms (A830βA83F)
- Phags-pa (A840βA87F)
- Saurashtra (A880βA8DF)
- Devanagari Extended (A8E0βA8FF)
- Kayah Li (A900βA92F)
- Rejang (A930βA95F)
- Hangul Jamo Extended-A (A960βA97F)
- Brahmic scripts:
- Javanese (A980βA9DF)
- Myanmar Extended-B (A9E0βA9FF)
- Cham (AA00βAA5F)
- Myanmar Extended-A (AA60βAA7F)
- Tai Viet (AA80βAADF)
- Meetei Mayek Extensions (AAE0βAAFF)
- Ethiopic Extended-A (AB00βAB2F)
- Latin Extended-E (AB30βAB6F)
- Cherokee Supplement (AB70βABBF)
- Meetei Mayek (ABC0βABFF)
- Hangul Syllables (AC00βD7AF)
- Hangul Jamo Extended-B (D7B0βD7FF)
- Surrogates:
- High Surrogates (D800βDB7F)
- High Private Use Surrogates (DB80βDBFF)
- Low Surrogates (DC00βDFFF)
- Private Use Area (E000βF8FF)
- CJK Compatibility Ideographs (F900βFAFF)
- Alphabetic Presentation Forms (FB00βFB4F)
- Arabic Presentation Forms-A (FB50βFDFF)
- Variation Selectors (FE00βFE0F)
- Vertical Forms (FE10βFE1F)
- Combining Half Marks (FE20βFE2F)
- CJK Compatibility Forms (FE30βFE4F)
- Small Form Variants (FE50βFE6F)
- Arabic Presentation Forms-B (FE70βFEFF)
- Halfwidth and Fullwidth Forms (FF00βFFEF)
- Specials (FFF0βFFFF)
Supplementary Multilingual Planeβ»
Plane 1, the Supplementary Multilingual Plane (SMP), contains historic scripts (except CJK ideographic), and symbols and notation used within certain fields. Scripts include Linear B, Egyptian hieroglyphs, and cuneiform scripts. It also includes English reform orthographies like Shavian and Deseret, and some modern scripts like Osage, Warang Citi, Adlam, Wancho and Toto. Symbols and notations include historic and modern musical notation; mathematical alphanumerics; shorthands; Emoji and other pictographic sets; and game symbols for playing cards, mahjong, and dominoes.
- Archaic Greek and other left-to-right scripts:
- Linear B Syllabary (10000β1007F)
- Linear B Ideograms (10080β100FF)
- Aegean Numbers (10100β1013F)
- Ancient Greek Numbers (10140β1018F)
- Ancient Symbols (10190β101CF)
- Phaistos Disc (101D0β101FF)
- Lycian (10280β1029F)
- Carian (102A0β102DF)
- Coptic Epact Numbers (102E0β102FF)
- Old Italic (10300β1032F)
- Gothic (10330β1034F)
- Old Permic (10350β1037F)
- Ugaritic (10380β1039F)
- Old Persian (103A0β103DF)
- Deseret (10400β1044F)
- Shavian (10450β1047F)
- Osmanya (10480β104AF)
- Osage (104B0β104FF)
- Elbasan (10500β1052F)
- Caucasian Albanian (10530β1056F)
- Vithkuqi (10570β105BF)
- Linear A (10600β1077F)
- Latin Extended-F (10780β107BF)
- Right-to-left scripts:
- Cypriot Syllabary (10800β1083F)
- Imperial Aramaic (10840β1085F)
- Palmyrene (10860β1087F)
- Nabataean (10880β108AF)
- Hatran (108E0β108FF)
- Phoenician (10900β1091F)
- Lydian (10920β1093F)
- Meroitic Hieroglyphs (10980β1099F)
- Meroitic Cursive (109A0β109FF)
- Kharoshthi (10A00β10A5F)
- Old South Arabian (10A60β10A7F)
- Old North Arabian (10A80β10A9F)
- Manichaean (10AC0β10AFF)
- Avestan (10B00β10B3F)
- Inscriptional Parthian (10B40β10B5F)
- Inscriptional Pahlavi (10B60β10B7F)
- Psalter Pahlavi (10B80β10BAF)
- Old Turkic (10C00β10C4F)
- Old Hungarian (10C80β10CFF)
- Hanifi Rohingya (10D00β10D3F)
- Rumi Numeral Symbols (10E60β10E7F)
- Yezidi (10E80β10EBF)
- Arabic Extended-C (10EC0β10EFF)
- Old Sogdian (10F00β10F2F)
- Sogdian (10F30β10F6F)
- Old Uyghur (10F70β10FAF)
- Chorasmian (10FB0β10FDF)
- Elymaic (10FE0β10FFF)
- Brahmic scripts:
- Brahmi (11000β1107F)
- Kaithi (11080β110CF)
- Sora Sompeng (110D0β110FF)
- Chakma (11100β1114F)
- Mahajani (11150β1117F)
- Sharada (11180β111DF)
- Sinhala Archaic Numbers (111E0β111FF)
- Khojki (11200β1124F)
- Multani (11280β112AF)
- Khudawadi (112B0β112FF)
- Grantha (11300β1137F)
- Newa (11400β1147F)
- Tirhuta (11480β114DF)
- Siddham (11580β115FF)
- Modi (11600β1165F)
- Mongolian Supplement (11660β1167F)
- Takri (11680β116CF)
- Ahom (11700β1174F)
- Dogra (11800β1184F)
- Warang Citi (118A0β118FF)
- Dives Akuru (11900β1195F)
- Nandinagari (119A0β119FF)
- Zanabazar Square (11A00β11A4F)
- Soyombo (11A50β11AAF)
- Unified Canadian Aboriginal Syllabics Extended-A (11AB0β11ABF)
- Brahmic scripts:
- Pau Cin Hau (11AC0β11AFF)
- Devanagari Extended-A (11B00β11B5F)
- Bhaiksuki (11C00β11C6F)
- Marchen (11C70β11CBF)
- Masaram Gondi (11D00β11D5F)
- Gunjala Gondi (11D60β11DAF)
- Makasar (11EE0β11EFF)
- Kawi (11F00β11F5F)
- Lisu Supplement (11FB0β11FBF)
- Tamil Supplement (11FC0β11FFF)
- Cuneiform scripts:
- Cuneiform (12000β123FF)
- Cuneiform Numbers and Punctuation (12400β1247F)
- Early Dynastic Cuneiform (12480β1254F)
- Cypro-Minoan (12F90β12FFF)
- Hieroglyphic scripts:
- Egyptian Hieroglyphs (13000β1342F)
- Egyptian Hieroglyph Format Controls (13430β1345F)
- Anatolian Hieroglyphs (14400β1467F)
- Bamum Supplement (16800β16A3F)
- Mro (16A40β16A6F)
- Tangsa (16A70β16ACF)
- Bassa Vah (16AD0β16AFF)
- Pahawh Hmong (16B00β16B8F)
- Medefaidrin (16E40β16E9F)
- Miao (16F00β16F9F)
- East Asian scripts:
- Ideographic Symbols and Punctuation (16FE0β16FFF)
- Tangut (17000β187FF)
- Tangut Components (18800β18AFF)
- Khitan Small Script (18B00β18CFF)
- Tangut Supplement (18D00β18D7F)
- Kana Extended-B (1AFF0β1AFFF)
- Kana Supplement (1B000β1B0FF)
- Kana Extended-A (1B100β1B12F)
- Small Kana Extension (1B130β1B16F)
- Nushu (1B170β1B2FF)
- Notational writing systems:
- Duployan (1BC00β1BC9F)
- Shorthand Format Controls (1BCA0β1BCAF)
- Symbols and numerals:
- Musical notation:
- Znamenny Musical Notation (1CF00β1CFCF)
- Byzantine Musical Symbols (1D000β1D0FF)
- Musical Symbols (1D100β1D1FF)
- Ancient Greek Musical Notation (1D200β1D24F)
- Kaktovik Numerals (1D2C0β1D2DF)
- Mayan Numerals (1D2E0β1D2FF)
- Mathematical symbols:
- Tai Xuan Jing Symbols (1D300β1D35F)
- Counting Rod Numerals (1D360β1D37F)
- Mathematical Alphanumeric Symbols (1D400β1D7FF)
- Musical notation:
- Notational writing systems:
- Sutton SignWriting (1D800β1DAAF)
- Other left-to-right scripts:
- Latin Extended-G (1DF00β1DFFF)
- Glagolitic Supplement (1E000β1E02F)
- Cyrillic Extended-D (1E030β1E08F)
- Nyiakeng Puachue Hmong (1E100β1E14F)
- Toto (1E290β1E2BF)
- Wancho (1E2C0β1E2FF)
- Nag Mundari (1E4D0β1E4FF)
- African scripts:
- Ethiopic Extended-B (1E7E0β1E7FF)
- Mende Kikakui (1E800β1E8DF)
- Adlam (1E900β1E95F)
- Symbols and numerals:
- Indic Siyaq Numbers (1EC70β1ECBF)
- Ottoman Siyaq Numbers (1ED00β1ED4F)
- Arabic Mathematical Alphabetic Symbols (1EE00β1EEFF)
- Game tiles and cards:
- Mahjong Tiles (1F000β1F02F)
- Domino Tiles (1F030β1F09F)
- Playing Cards (1F0A0β1F0FF)
- Enclosed Alphanumeric Supplement (1F100β1F1FF)
- Enclosed Ideographic Supplement (1F200β1F2FF)
- Miscellaneous Symbols and Pictographs (1F300β1F5FF)
- Emoticons (1F600β1F64F)
- Ornamental Dingbats (1F650β1F67F)
- Transport and Map Symbols (1F680β1F6FF)
- Alchemical Symbols (1F700β1F77F)
- Geometric Shapes Extended (1F780β1F7FF)
- Supplemental Arrows-C (1F800β1F8FF)
- Supplemental Symbols and Pictographs (1F900β1F9FF)
- Chess Symbols (1FA00β1FA6F)
- Symbols and Pictographs Extended-A (1FA70β1FAFF)
- Symbols for Legacy Computing (1FB00β1FBFF)
Supplementary Ideographic Plane β»
Plane 2, the Supplementary Ideographic Plane (SIP), is used for CJK Ideographs, mostly CJK Unified Ideographs, that were not included in earlier character encoding standards.
As of Unicode 15.1, the SIP comprises the following seven blocks:
- CJK Unified Ideographs Extension B (20000β2A6DF)
- CJK Unified Ideographs Extension C (2A700β2B73F)
- CJK Unified Ideographs Extension D (2B740β2B81F)
- CJK Unified Ideographs Extension E (2B820β2CEAF)
- CJK Unified Ideographs Extension F (2CEB0β2EBEF)
- CJK Unified Ideographs Extension I (2EBF0β2EE5F)
- CJK Compatibility Ideographs Supplement (2F800β2FA1F)
Plane 3 is the Tertiary Ideographic Plane (TIP). CJK Unified Ideographs Extension G was added to the TIP in Unicode 13.0, released in March 2020. It also is tentatively allocated for Oracle Bone script and Small Seal Script.
As of Unicode 15.1, the TIP comprises the following two blocks:
- CJK Unified Ideographs Extension G (30000β3134F)
- CJK Unified Ideographs Extension H (31350β323AF)
Unassigned planesβ»
Planes 4 to 13 (planes 4 to D in hexadecimal): No characters have yet been assigned. Or proposed for assignment, to Planes 4 through 13.
Plane 14 (E in hexadecimal) is designated as the Supplementary Special-purpose Plane (SSP). It comprises the following two blocks, as of Unicode 15.1:
- Tags (E0000βE007F)
- Variation Selectors Supplement (E0100βE01EF) β used to indicate alternate glyphs for characters.
Private Use Area Planes β»
The two planes 15 and 16 (planes F and 10 in hexadecimal) each contain a "Private Use Area". They contain blocks named Supplementary Private Use Area-A (PUA-A) and -B (PUA-B). The Private Use Areas are available for use by parties outside ISO and Unicode (private character encoding).
Referencesβ»
- ^ "Glossary". www.unicode.org. Retrieved 2021-09-27.
- ^ See Table 3.5 "UTF-16 Bit Distribution" in the Unicode Standard https://www.unicode.org/versions/Unicode6.0.0/UnicodeStandard-6.0.pdf
- ^ See Table 3.6 "UTF-8 Bit Distribution" in the Unicode Standard https://www.unicode.org/versions/Unicode6.0.0/UnicodeStandard-6.0.pdf
- ^ "Roadmaps to Unicode". www.unicode.org. Retrieved 2021-09-27.
- ^ "Announcing The Unicode Standard, Version 13.0".
- ^ "Proposed New Characters: The Pipeline". www.unicode.org.