Unicode Converter

Type or paste any text to inspect each character's codepoint, UTF-8 bytes, UTF-16 encoding, HTML entities, JavaScript escape, CSS escape, and URL encoding. Click any character to see all representations.

Encoding representations

Unicode codepoint

U+1F44B

The canonical identifier for a character in the Unicode standard. Written as U+ followed by a hex number. There are 1,114,112 possible codepoints (U+0000 to U+10FFFF).

UTF-8 bytes

0xF0 0x9F 0x91 0x8B

The variable-length byte encoding. ASCII characters use 1 byte; most Latin/Cyrillic/Arabic/Hebrew use 2; CJK and other scripts use 3; emoji and supplementary characters use 4 bytes.

UTF-16 words

0xD83D 0xDC4B

The encoding used by JavaScript, Java, and Windows internally. Characters above U+FFFF require two 16-bit code units called a surrogate pair (high surrogate + low surrogate).

HTML entities

👋 or 👋

HTML numeric character references in decimal (👋) or hexadecimal (👋) form. Safe to use in HTML even when the character can't be typed directly.

JavaScript / JSON escape

\u{1F44B}

For characters in the BMP (U+0000–U+FFFF): \uXXXX. For supplementary plane characters: \u{XXXXX} (ES6+). Required inside string literals in source code.

CSS escape

\1F44B

Used in CSS content property values and selectors. A backslash followed by the hex codepoint. Required when inserting characters via CSS ::before / ::after.

UTF-8 byte count by codepoint range

How many bytes each character takes in UTF-8 storage

Range	Bytes	Examples
U+0000 – U+007F	1	ASCII: A, 0, space, !…
U+0080 – U+07FF	2	Latin Extended, Cyrillic, Arabic, Hebrew
U+0800 – U+FFFF	3	Devanagari, CJK, emoji in BMP
U+10000 – U+10FFFF	4	Emoji 🎉, supplementary CJK, historic scripts

Frequently asked questions

What is the difference between Unicode, UTF-8, and UTF-16?

Unicode is the standard that assigns a number (codepoint) to every character from every writing system. UTF-8 and UTF-16 are encoding schemes that represent those codepoints as bytes for storage and transmission. UTF-8 uses 1–4 bytes per character and is ASCII-compatible — it dominates the web. UTF-16 uses 2 or 4 bytes and is used internally by JavaScript, Java, and Windows.

Why do emoji need 4 bytes in UTF-8?

UTF-8 encoding uses 1 byte for U+0000–U+007F (ASCII), 2 for U+0080–U+07FF, 3 for U+0800–U+FFFF, and 4 for U+10000–U+10FFFF. Most emoji live in the Supplementary Multilingual Plane (above U+FFFF), so they require 4 bytes. In UTF-16, supplementary plane characters require a surrogate pair — two 16-bit code units.

What is a surrogate pair in UTF-16?

UTF-16 encodes characters above U+FFFF as two 16-bit code units: a high surrogate (U+D800–U+DBFF) and a low surrogate (U+DC00–U+DFFF). Together they encode a single codepoint. JavaScript strings are UTF-16 internally, so a single emoji character has .length === 2. Use Array.from() or the spread operator to iterate by codepoints rather than code units.

What is the Unicode BMP (Basic Multilingual Plane)?

The BMP is the first 65,536 codepoints (U+0000–U+FFFF). It covers most modern scripts: Latin, Cyrillic, Arabic, Hebrew, Devanagari, CJK, and many more. Characters outside the BMP (emoji, historic scripts, supplementary CJK) are in the supplementary planes (U+10000–U+10FFFF) and require extra handling in UTF-16.

How do I use a Unicode character in HTML?

Three options: (1) Type the character directly if your editor and file encoding support it. (2) Use a decimal numeric entity: 👋 (3) Use a hex numeric entity: 👋 All three are equivalent in HTML. Named entities like & or < only exist for the most common characters.

Related Tools

TOOL

URL EncodePercent-encode URLs and query parameters, or decode encoded strings back to readable text. Updates as you type.

TOOL

HTML EntitiesEncode HTML special characters to entities or decode entities back to text. Handles named, decimal, and hex entities.

TOOL

Base64Decode base64 strings to text or JSON, or encode any text to base64. Handles URL-safe base64 and missing padding.

TOOL

Hash GeneratorGenerate SHA-1, SHA-256, SHA-384, or SHA-512 hashes from any text. Uses the Web Crypto API — nothing leaves your browser.

TOOL

JWT DecoderPaste a JWT to decode the header and payload claims. Timestamps auto-convert, expiry status shown, color-coded parts.

TOOL

Base64 ImageEncode an image to a base64 data URI for CSS backgrounds and inline HTML, or decode a data URI back to a viewable image.

All conversions run in your browser — nothing is uploaded.

Browse all 27 converters →