Unicode Converter
Type or paste any text to inspect each character's codepoint, UTF-8 bytes, UTF-16 encoding, HTML entities, JavaScript escape, CSS escape, and URL encoding. Click any character to see all representations.
Encoding representations
U+1F44BThe canonical identifier for a character in the Unicode standard. Written as U+ followed by a hex number. There are 1,114,112 possible codepoints (U+0000 to U+10FFFF).
0xF0 0x9F 0x91 0x8BThe variable-length byte encoding. ASCII characters use 1 byte; most Latin/Cyrillic/Arabic/Hebrew use 2; CJK and other scripts use 3; emoji and supplementary characters use 4 bytes.
0xD83D 0xDC4BThe encoding used by JavaScript, Java, and Windows internally. Characters above U+FFFF require two 16-bit code units called a surrogate pair (high surrogate + low surrogate).
👋 or 👋HTML numeric character references in decimal (👋) or hexadecimal (👋) form. Safe to use in HTML even when the character can't be typed directly.
\u{1F44B}For characters in the BMP (U+0000–U+FFFF): \uXXXX. For supplementary plane characters: \u{XXXXX} (ES6+). Required inside string literals in source code.
\1F44BUsed in CSS content property values and selectors. A backslash followed by the hex codepoint. Required when inserting characters via CSS ::before / ::after.
UTF-8 byte count by codepoint range
How many bytes each character takes in UTF-8 storage
| Range | Bytes | Examples |
|---|---|---|
| U+0000 – U+007F | 1 | ASCII: A, 0, space, !… |
| U+0080 – U+07FF | 2 | Latin Extended, Cyrillic, Arabic, Hebrew |
| U+0800 – U+FFFF | 3 | Devanagari, CJK, emoji in BMP |
| U+10000 – U+10FFFF | 4 | Emoji 🎉, supplementary CJK, historic scripts |
Frequently asked questions
What is the difference between Unicode, UTF-8, and UTF-16?
Unicode is the standard that assigns a number (codepoint) to every character from every writing system. UTF-8 and UTF-16 are encoding schemes that represent those codepoints as bytes for storage and transmission. UTF-8 uses 1–4 bytes per character and is ASCII-compatible — it dominates the web. UTF-16 uses 2 or 4 bytes and is used internally by JavaScript, Java, and Windows.
Why do emoji need 4 bytes in UTF-8?
UTF-8 encoding uses 1 byte for U+0000–U+007F (ASCII), 2 for U+0080–U+07FF, 3 for U+0800–U+FFFF, and 4 for U+10000–U+10FFFF. Most emoji live in the Supplementary Multilingual Plane (above U+FFFF), so they require 4 bytes. In UTF-16, supplementary plane characters require a surrogate pair — two 16-bit code units.
What is a surrogate pair in UTF-16?
UTF-16 encodes characters above U+FFFF as two 16-bit code units: a high surrogate (U+D800–U+DBFF) and a low surrogate (U+DC00–U+DFFF). Together they encode a single codepoint. JavaScript strings are UTF-16 internally, so a single emoji character has .length === 2. Use Array.from() or the spread operator to iterate by codepoints rather than code units.
What is the Unicode BMP (Basic Multilingual Plane)?
The BMP is the first 65,536 codepoints (U+0000–U+FFFF). It covers most modern scripts: Latin, Cyrillic, Arabic, Hebrew, Devanagari, CJK, and many more. Characters outside the BMP (emoji, historic scripts, supplementary CJK) are in the supplementary planes (U+10000–U+10FFFF) and require extra handling in UTF-16.
How do I use a Unicode character in HTML?
Three options: (1) Type the character directly if your editor and file encoding support it. (2) Use a decimal numeric entity: 👋 (3) Use a hex numeric entity: 👋 All three are equivalent in HTML. Named entities like & or < only exist for the most common characters.
Related Tools
All conversions run in your browser — nothing is uploaded.
Browse all 26 converters →