Input

Mode

Output

What Is a UTF-8 Encoder / Decoder?

UTF-8 is the world's most widely used character encoding. It can represent every character in the Unicode standard — from plain ASCII letters to emoji, Chinese characters, and Arabic script — using a variable number of bytes (1 to 4). This tool lets you see exactly which bytes your text produces when encoded as UTF-8, and it can reverse the process too.

Unlike ASCII, which maps one character to exactly one byte, UTF-8 uses 1 byte for ASCII-range characters (U+0000–U+007F), 2 bytes for characters up to U+07FF, 3 bytes for characters up to U+FFFF (including the euro sign €), and 4 bytes for characters beyond U+FFFF like many emoji. The TextEncoder API in modern browsers makes this encoding trivial to perform in JavaScript.

How to Use

1

Pick a mode

Select Encode to convert text → UTF-8 hex bytes, or Decode to convert hex bytes → text. You can also customise the output format using the separator and case options.

2

Paste or type your input

In Encode mode, type any Unicode text. In Decode mode, paste hex bytes separated by spaces, commas, or with 0x prefixes — the tool accepts all common formats.

3

Copy or download the result

The conversion happens in real time. Use Copy to copy the output or Download to save it as a .txt file. The byte count is shown next to the output panel header.

Example

Here are two worked examples showing how UTF-8 encoding looks for ASCII text and for a multi-byte Unicode character.

Encode: "Hello €"

Input

Output

Output

Decode: UTF-8 bytes → text

Input

Output

Output

FAQ

What is UTF-8 and why does everyone use it?

UTF-8 is a variable-width encoding for Unicode. It was designed to be backward-compatible with ASCII — the first 128 code points map to the same single byte as ASCII — while still being able to represent every character ever written. The W3C recommends UTF-8 as the default encoding for all web content, and it now accounts for over 98% of pages on the web.

Why does the euro sign € use 3 bytes (E2 82 AC)?

UTF-8 encodes characters outside the basic ASCII range using a multi-byte scheme. The euro sign has Unicode code point U+20AC. Because it falls in the U+0800–U+FFFF range, UTF-8 needs 3 bytes to encode it. The bit pattern of U+20AC (0010 0000 1010 1100) gets split across three bytes: 1110xxxx 10xxxxxx 10xxxxxx, giving E2 82 AC. You can read the full encoding algorithm in RFC 3629 §3.

What is the difference between UTF-8 and ASCII?

ASCII only covers 128 characters (0–127) and uses exactly one byte per character. UTF-8 covers all 1,114,112 Unicode code points using 1–4 bytes. For any character in the ASCII range, the UTF-8 encoding is identical to ASCII, so UTF-8 files containing only ASCII characters are byte-for-byte identical to ASCII files. Learn more at MDN Glossary: ASCII.

What formats does the decoder accept?

The decoder is flexible. It accepts hex bytes separated by spaces (48 65 6C 6C 6F), by commas (48,65,6C,6C,6F), with 0x prefixes (0x48 0x65), or with no separator at all (48656C6C6F). Case does not matter — e2 82 ac and E2 82 AC produce the same result.

How does this tool handle emoji?

Emoji occupy code points above U+FFFF, so they require 4 bytes in UTF-8. For example, 😀 (U+1F600) encodes to F0 9F 98 80. The tool uses the browser's native TextEncoder and TextDecoder APIs, so all emoji and surrogate pairs are handled correctly.

Can I use this to debug encoding problems in my app?

Yes. Paste suspicious text into Encode mode and check whether the byte sequence matches what your database or API is sending. This is particularly useful when you encounter garbled characters (mojibake) caused by a charset mismatch between your application layers.

Related Tools