How To Use Unicode in HTML

Q: What is the difference between UTF-8 and Unicode?

Unicode is the standard that assigns a unique code point (e.g. U+0041 for "A") to every character. UTF-8 is one of several encodings that specifies how those code points are stored as bytes. UTF-8 uses 1 to 4 bytes per code point and is the required encoding for HTML documents according to the WHATWG HTML Living Standard.

Q: Do I need the meta charset tag if my server sends the correct Content-Type header?

The HTTP Content-Type header takes precedence over the meta charset tag when both are present. However, you should still include in every document because the meta tag ensures correct rendering when the file is opened locally (where there is no HTTP header) and serves as a clear in-document declaration for developers.

Q: Can I use emoji directly in HTML?

Yes. If your HTML file is saved as UTF-8 and declares , you can paste emoji directly into the source. Alternatively, use the hex numeric character reference — for example 😀 renders 😀. Both methods produce the same result.

Q: What is a numeric character reference in HTML?

A numeric character reference (NCR) is a way to include any Unicode code point in HTML using its code point number. The decimal form is &#N; (e.g. € for €) and the hexadecimal form is &#xN; (e.g. € for €). NCRs work in every browser regardless of the document's file encoding.

Q: Which method should I use — direct character, named entity, or numeric reference?

In a UTF-8 HTML document, prefer direct characters for readability. Use named entities like &, <, and > when the character has structural meaning in HTML or when clarity matters. Use numeric character references when no named entity exists (for example ✓ for ✓ or 😀 for 😀).

What Unicode Is (and Why It Matters for HTML)

Unicode is a character encoding standard that assigns a unique code point to every character in every writing system. Code points are written in the form U+XXXX — for example, the letter A is U+0041 and the euro sign € is U+20AC. As of Unicode 15.1, the standard covers 149,813 characters across 161 scripts, ranging from basic Latin letters to CJK ideographs, mathematical symbols, and emoji.

UTF-8 is the encoding that translates Unicode code points into the bytes that travel across a network or sit in a file. It is a variable-width encoding: ASCII characters (U+0000–U+007F) use a single byte identical to their ASCII value, while characters beyond that range use 2, 3, or 4 bytes. This backwards-compatibility with ASCII is one reason UTF-8 has become the dominant encoding on the web — as of the WHATWG HTML Living Standard, all documents served as text/html are required to use UTF-8.

If you need to convert text or explore ASCII alongside Unicode, our guide to understanding ASCII character encoding covers the relationship between the two standards in depth.

The Charset Meta Tag — Why It Must Come First

Before you can safely use any Unicode character in HTML, you need to tell the browser how to interpret the bytes it receives. That is exactly what the <meta charset> tag does:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>My Page</title>
</head>
<body>
  <p>© 2026 — Unicode works here: 你好, こんにちは, مرحبا</p>
</body>
</html>

The WHATWG HTML Living Standard requires this tag to appear within the first 1,024 bytes of the document. Placing it as the very first element inside <head> guarantees that the browser reads it before attempting to decode any other content on the page.

Without the charset declaration, browsers fall back to heuristic encoding detection — an algorithm that guesses the encoding by scanning the byte sequence of the file. On a document with mostly ASCII content the guess is often right, but once you add characters outside the ASCII range (accented letters, currency symbols, emoji) a wrong guess produces garbled output: the classic â€™ in place of a right apostrophe.

One more thing: never use the older verbose form <meta http-equiv="Content-Type" content="text/html; charset=utf-8">. That syntax comes from HTML4 and is deprecated. The short form <meta charset="UTF-8"> is the current HTML Living Standard.

Three Ways to Include a Unicode Character in HTML

There are exactly three methods for getting a Unicode character into your HTML markup. Each has trade-offs around readability, portability, and browser support.

1. Direct UTF-8 Character

The simplest approach — just type or paste the character into your source file:

<p>€ 10.00</p>
<p>© 2026 Acme Corp.</p>
<p>Temperature: 37°C</p>

This works perfectly as long as two conditions are met: the file is saved with UTF-8 encoding (most modern editors — VS Code, Sublime Text, JetBrains IDEs — default to UTF-8), and the document includes <meta charset="UTF-8">. When both are true, direct characters produce the most readable source code.

2. Named HTML Entity

A named entity is a short mnemonic starting with & and ending with ;:

<p>&euro; 10.00</p>
<p>&copy; 2026 Acme Corp.</p>
<p>Protons &amp; Neutrons</p>

Named entities are human-readable, work in all browsers, and do not require the file to be saved as UTF-8. Their limitation is coverage: not every Unicode character has a named entity. The HTML Living Standard defines a fixed list of named character references — characters outside that list must use a numeric reference instead.

3. Numeric Character Reference (NCR)

A numeric character reference encodes the Unicode code point directly — either as a decimal number or in hexadecimal:

<!-- Decimal form: &#N; -->
<p>&#8364; 10.00</p>   <!-- € — U+20AC = 8364 decimal -->

<!-- Hexadecimal form: &#xN; -->
<p>&#x20AC; 10.00</p>  <!-- same character, hex code point -->

<!-- Emoji using hex NCR -->
<p>&#x1F600;</p>       <!-- 😀 — U+1F600 -->

NCRs work for every Unicode code point — including emoji and supplementary characters above U+FFFF — in every browser, regardless of the document's file encoding. The cost is readability:€ is much harder to recognise at a glance than € or a literal €.

Comparison of the Three Methods

Method	Example for €	Pros	Cons
Direct UTF-8 char	€	Most readable source code	File must be UTF-8; needs correct meta charset
Named entity	€	Always works; human-readable	Not every character has a named entity
Decimal NCR	€	Works everywhere	Not human-readable
Hex NCR	€	Works everywhere; matches U+ code point	Not human-readable

Common Unicode Characters and Their HTML Codes

These are the characters that come up most often in everyday HTML authoring. The named entity and decimal numeric reference are both valid; use whichever is clearest in context. You can also use our HTML Encoder / Decoder tool to look up or convert any character on demand.

Character	Code Point	Named Entity	Decimal NCR
© (copyright)	U+00A9	©	©
® (registered)	U+00AE	®	®
™ (trademark)	U+2122	™	™
€ (euro)	U+20AC	€	€
£ (pound)	U+00A3	£	£
— (em dash)	U+2014	—	—
– (en dash)	U+2013	–	–
" (left double quote)	U+201C	“	“
" (right double quote)	U+201D	”	”
(non-breaking space)	U+00A0
← (left arrow)	U+2190	←	←
→ (right arrow)	U+2192	→	→
✓ (check mark)	U+2713	— none —	✓
★ (black star)	U+2605	— none —	★
😀 (grinning face)	U+1F600	— none —	😀

Emoji and Characters Above U+FFFF

The Unicode code space is divided into 17 planes. The first — the Basic Multilingual Plane (BMP) — covers U+0000 through U+FFFF and contains the vast majority of commonly used characters. Emoji and many historic scripts live in the supplementary planes: U+10000 and above.

In UTF-16 (the internal representation used by JavaScript strings), supplementary characters require two 16-bit code units called a surrogate pair. In UTF-8 they use 4 bytes. In HTML you do not need to worry about surrogates at all — the hex numeric character reference handles them natively:

<!-- Hex NCR — works for any code point including emoji -->
<p>&#x1F600;</p>  <!-- 😀 grinning face, U+1F600 -->
<p>&#x1F4A1;</p>  <!-- 💡 light bulb,    U+1F4A1 -->
<p>&#x1F680;</p>  <!-- 🚀 rocket,         U+1F680 -->

<!-- Direct UTF-8 — also works perfectly in a UTF-8 document -->
<p>😀 💡 🚀</p>

Both approaches produce identical output in the browser. The hex NCR is useful when your text editor or deployment pipeline might corrupt multi-byte characters; the direct form is more readable.

For encoding and decoding HTML entities in your workflow, see our guide on how to encode HTML code — it covers encoding reserved characters like <, >, and & for XSS prevention.

The HTTP Content-Type Header

The character encoding declaration has two locations: the meta charset tag in the HTML and the Content-Type HTTP response header sent by the server. When both are present, the HTTP header wins — it takes precedence over the in-document tag.

For a correctly configured server, the response header should look like:

Content-Type: text/html; charset=utf-8

For nginx, add charset utf-8; to your server block. For Apache, use AddDefaultCharset UTF-8 in your configuration or .htaccess file. Most CDNs and hosting platforms pass through whatever the origin server sends, so fixing it at the server level covers both the HTTP header and the cache.

Still include <meta charset="UTF-8"> even when the HTTP header is set correctly. When a user saves the page and opens it locally, there is no HTTP header — the meta tag is the only encoding declaration available.

Best Practices Summary

Always declare UTF-8. Put <meta charset="UTF-8"> as the first element inside <head>, within the first 1,024 bytes of the document.
Save files as UTF-8. Modern editors default to UTF-8; verify in the status bar or file settings if you are unsure.
Use direct characters for readability. In a properly encoded document, typing € directly is cleaner than € or €.
Use named entities for reserved characters. Always write &, <, and > when those characters appear as content rather than markup.
Use numeric references for characters with no named entity. For example, ✓ (✓) or 😀 (😀).
Configure the server Content-Type header. Set Content-Type: text/html; charset=utf-8 at the server or CDN level so both the HTTP declaration and the meta tag are consistent.
Do not use the deprecated verbose meta tag. The short form <meta charset="UTF-8"> is the standard; the http-equiv form is deprecated.

If you need to encode or decode text for URLs rather than HTML, see our Base64 encoder / decoder for binary-to-text encoding tasks.

Frequently Asked Questions

What is the difference between UTF-8 and Unicode?

Unicode is the standard that defines code points — the numbers assigned to characters. UTF-8 is one encoding that specifies how those code points are stored as bytes. There are other Unicode encodings (UTF-16, UTF-32), but UTF-8 is the required encoding for HTML documents per the WHATWG HTML Living Standard.

Do I need the meta charset tag if my server sends the correct Content-Type header?

The HTTP header takes precedence, but you should still include the meta tag. When a file is opened locally — dragged into a browser from the desktop, or viewed via a file:// URL — there is no HTTP header, and the meta tag is the only encoding declaration the browser sees.

Can I use emoji directly in HTML?

Yes. Save your HTML file as UTF-8, include <meta charset="UTF-8">, and paste the emoji directly into the source. Alternatively, use the hex numeric character reference — 😀 — which works regardless of the file encoding.

What is a numeric character reference in HTML?

A numeric character reference (NCR) encodes a Unicode code point directly in HTML. The decimal form is &#N; (e.g. € for €) and the hexadecimal form is &#xN; (e.g. €for the same character). NCRs work everywhere because they reference the code point directly rather than depending on the document's byte encoding.

Which method should I use — direct character, named entity, or numeric reference?

In a modern UTF-8 HTML document: use direct characters for readability, named entities (&, <, >) for reserved characters, and numeric character references for any character that has no named entity — such as ✓ (✓) or emoji.

How To Use Unicode in HTML

What Unicode Is (and Why It Matters for HTML)

The Charset Meta Tag — Why It Must Come First

Three Ways to Include a Unicode Character in HTML

1. Direct UTF-8 Character

2. Named HTML Entity

3. Numeric Character Reference (NCR)

Comparison of the Three Methods

Common Unicode Characters and Their HTML Codes

Emoji and Characters Above U+FFFF

The HTTP Content-Type Header

Best Practices Summary

Frequently Asked Questions

What is the difference between UTF-8 and Unicode?

Do I need the meta charset tag if my server sends the correct Content-Type header?

Can I use emoji directly in HTML?

What is a numeric character reference in HTML?

Which method should I use — direct character, named entity, or numeric reference?

HTML Encoder / Decoder

Understanding ASCII Character Encoding

How To Encode HTML Code

Base64 Encoder / Decoder

What Unicode Is (and Why It Matters for HTML)

The Charset Meta Tag — Why It Must Come First

Three Ways to Include a Unicode Character in HTML

1. Direct UTF-8 Character

2. Named HTML Entity

3. Numeric Character Reference (NCR)

Comparison of the Three Methods

Common Unicode Characters and Their HTML Codes

Emoji and Characters Above U+FFFF

The HTTP Content-Type Header

Best Practices Summary

Frequently Asked Questions

What is the difference between UTF-8 and Unicode?

Do I need the meta charset tag if my server sends the correct Content-Type header?

Can I use emoji directly in HTML?

What is a numeric character reference in HTML?

Which method should I use — direct character, named entity, or numeric reference?

Related Tools & Guides

HTML Encoder / Decoder

Understanding ASCII Character Encoding

How To Encode HTML Code

Base64 Encoder / Decoder