May 14, 2026

How To Use Unicode in HTML

How To Use Unicode in HTML starts with declaring the correct character encoding — without it, browsers may misread your characters and display garbled text. This guide walks through exactly what Unicode and UTF-8 are, why the <meta charset="UTF-8"> tag is non-negotiable, and the three methods for inserting any character into an HTML document: directly as a UTF-8 character, as a named HTML entity, or as a numeric character reference.

how to use unicode in html — unicode character encoding chart showing UTF-8 code points in HTML

What Unicode Is (and Why It Matters for HTML)

Unicode is a character encoding standard that assigns a unique code point to every character in every writing system. Code points are written in the form U+XXXX — for example, the letter A is U+0041 and the euro sign € is U+20AC. As of Unicode 15.1, the standard covers 149,813 characters across 161 scripts, ranging from basic Latin letters to CJK ideographs, mathematical symbols, and emoji.

UTF-8 is the encoding that translates Unicode code points into the bytes that travel across a network or sit in a file. It is a variable-width encoding: ASCII characters (U+0000–U+007F) use a single byte identical to their ASCII value, while characters beyond that range use 2, 3, or 4 bytes. This backwards-compatibility with ASCII is one reason UTF-8 has become the dominant encoding on the web — as of the WHATWG HTML Living Standard, all documents served as text/html are required to use UTF-8.

If you need to convert text or explore ASCII alongside Unicode, our guide to understanding ASCII character encoding covers the relationship between the two standards in depth.

The Charset Meta Tag — Why It Must Come First

Before you can safely use any Unicode character in HTML, you need to tell the browser how to interpret the bytes it receives. That is exactly what the <meta charset> tag does:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>My Page</title>
</head>
<body>
  <p>© 2026 — Unicode works here: 你好, こんにちは, مرحبا</p>
</body>
</html>

The WHATWG HTML Living Standard requires this tag to appear within the first 1,024 bytes of the document. Placing it as the very first element inside <head> guarantees that the browser reads it before attempting to decode any other content on the page.

Without the charset declaration, browsers fall back to heuristic encoding detection — an algorithm that guesses the encoding by scanning the byte sequence of the file. On a document with mostly ASCII content the guess is often right, but once you add characters outside the ASCII range (accented letters, currency symbols, emoji) a wrong guess produces garbled output: the classic ’ in place of a right apostrophe.

One more thing: never use the older verbose form <meta http-equiv="Content-Type" content="text/html; charset=utf-8">. That syntax comes from HTML4 and is deprecated. The short form <meta charset="UTF-8"> is the current HTML Living Standard.

Three Ways to Include a Unicode Character in HTML

There are exactly three methods for getting a Unicode character into your HTML markup. Each has trade-offs around readability, portability, and browser support.

1. Direct UTF-8 Character

The simplest approach — just type or paste the character into your source file:

<p>€ 10.00</p>
<p>© 2026 Acme Corp.</p>
<p>Temperature: 37°C</p>

This works perfectly as long as two conditions are met: the file is saved with UTF-8 encoding (most modern editors — VS Code, Sublime Text, JetBrains IDEs — default to UTF-8), and the document includes <meta charset="UTF-8">. When both are true, direct characters produce the most readable source code.

2. Named HTML Entity

A named entity is a short mnemonic starting with & and ending with ;:

<p>&euro; 10.00</p>
<p>&copy; 2026 Acme Corp.</p>
<p>Protons &amp; Neutrons</p>

Named entities are human-readable, work in all browsers, and do not require the file to be saved as UTF-8. Their limitation is coverage: not every Unicode character has a named entity. The HTML Living Standard defines a fixed list of named character references — characters outside that list must use a numeric reference instead.

3. Numeric Character Reference (NCR)

A numeric character reference encodes the Unicode code point directly — either as a decimal number or in hexadecimal:

<!-- Decimal form: &#N; -->
<p>&#8364; 10.00</p>   <!-- € — U+20AC = 8364 decimal -->

<!-- Hexadecimal form: &#xN; -->
<p>&#x20AC; 10.00</p>  <!-- same character, hex code point -->

<!-- Emoji using hex NCR -->
<p>&#x1F600;</p>       <!-- 😀 — U+1F600 -->

NCRs work for every Unicode code point — including emoji and supplementary characters above U+FFFF — in every browser, regardless of the document's file encoding. The cost is readability: is much harder to recognise at a glance than or a literal €.

Comparison of the Three Methods

MethodExample for €ProsCons
Direct UTF-8 charMost readable source codeFile must be UTF-8; needs correct meta charset
Named entity&euro;Always works; human-readableNot every character has a named entity
Decimal NCR&#8364;Works everywhereNot human-readable
Hex NCR&#x20AC;Works everywhere; matches U+ code pointNot human-readable

Common Unicode Characters and Their HTML Codes

These are the characters that come up most often in everyday HTML authoring. The named entity and decimal numeric reference are both valid; use whichever is clearest in context. You can also use our HTML Encoder / Decoder tool to look up or convert any character on demand.

CharacterCode PointNamed EntityDecimal NCR
© (copyright)U+00A9&copy;&#169;
® (registered)U+00AE&reg;&#174;
™ (trademark)U+2122&trade;&#8482;
€ (euro)U+20AC&euro;&#8364;
£ (pound)U+00A3&pound;&#163;
— (em dash)U+2014&mdash;&#8212;
– (en dash)U+2013&ndash;&#8211;
" (left double quote)U+201C&ldquo;&#8220;
" (right double quote)U+201D&rdquo;&#8221;
(non-breaking space)U+00A0&nbsp;&#160;
← (left arrow)U+2190&larr;&#8592;
→ (right arrow)U+2192&rarr;&#8594;
✓ (check mark)U+2713— none —&#10003;
★ (black star)U+2605— none —&#9733;
😀 (grinning face)U+1F600— none —&#128512;

Emoji and Characters Above U+FFFF

The Unicode code space is divided into 17 planes. The first — the Basic Multilingual Plane (BMP) — covers U+0000 through U+FFFF and contains the vast majority of commonly used characters. Emoji and many historic scripts live in the supplementary planes: U+10000 and above.

In UTF-16 (the internal representation used by JavaScript strings), supplementary characters require two 16-bit code units called a surrogate pair. In UTF-8 they use 4 bytes. In HTML you do not need to worry about surrogates at all — the hex numeric character reference handles them natively:

<!-- Hex NCR — works for any code point including emoji -->
<p>&#x1F600;</p>  <!-- 😀 grinning face, U+1F600 -->
<p>&#x1F4A1;</p>  <!-- 💡 light bulb,    U+1F4A1 -->
<p>&#x1F680;</p>  <!-- 🚀 rocket,         U+1F680 -->

<!-- Direct UTF-8 — also works perfectly in a UTF-8 document -->
<p>😀 💡 🚀</p>

Both approaches produce identical output in the browser. The hex NCR is useful when your text editor or deployment pipeline might corrupt multi-byte characters; the direct form is more readable.

For encoding and decoding HTML entities in your workflow, see our guide on how to encode HTML code — it covers encoding reserved characters like <, >, and & for XSS prevention.

The HTTP Content-Type Header

The character encoding declaration has two locations: the meta charset tag in the HTML and the Content-Type HTTP response header sent by the server. When both are present, the HTTP header wins — it takes precedence over the in-document tag.

For a correctly configured server, the response header should look like:

Content-Type: text/html; charset=utf-8

For nginx, add charset utf-8; to your server block. For Apache, use AddDefaultCharset UTF-8 in your configuration or .htaccess file. Most CDNs and hosting platforms pass through whatever the origin server sends, so fixing it at the server level covers both the HTTP header and the cache.

Still include <meta charset="UTF-8"> even when the HTTP header is set correctly. When a user saves the page and opens it locally, there is no HTTP header — the meta tag is the only encoding declaration available.

Best Practices Summary

  • Always declare UTF-8. Put <meta charset="UTF-8"> as the first element inside <head>, within the first 1,024 bytes of the document.
  • Save files as UTF-8. Modern editors default to UTF-8; verify in the status bar or file settings if you are unsure.
  • Use direct characters for readability. In a properly encoded document, typing € directly is cleaner than &euro; or &#8364;.
  • Use named entities for reserved characters. Always write &amp;, &lt;, and &gt; when those characters appear as content rather than markup.
  • Use numeric references for characters with no named entity. For example, &#10003; (✓) or &#x1F600; (😀).
  • Configure the server Content-Type header. Set Content-Type: text/html; charset=utf-8 at the server or CDN level so both the HTTP declaration and the meta tag are consistent.
  • Do not use the deprecated verbose meta tag. The short form <meta charset="UTF-8"> is the standard; the http-equiv form is deprecated.

If you need to encode or decode text for URLs rather than HTML, see our Base64 encoder / decoder for binary-to-text encoding tasks.

Frequently Asked Questions

What is the difference between UTF-8 and Unicode?

Unicode is the standard that defines code points — the numbers assigned to characters. UTF-8 is one encoding that specifies how those code points are stored as bytes. There are other Unicode encodings (UTF-16, UTF-32), but UTF-8 is the required encoding for HTML documents per the WHATWG HTML Living Standard.

Do I need the meta charset tag if my server sends the correct Content-Type header?

The HTTP header takes precedence, but you should still include the meta tag. When a file is opened locally — dragged into a browser from the desktop, or viewed via a file:// URL — there is no HTTP header, and the meta tag is the only encoding declaration the browser sees.

Can I use emoji directly in HTML?

Yes. Save your HTML file as UTF-8, include <meta charset="UTF-8">, and paste the emoji directly into the source. Alternatively, use the hex numeric character reference — &#x1F600; — which works regardless of the file encoding.

What is a numeric character reference in HTML?

A numeric character reference (NCR) encodes a Unicode code point directly in HTML. The decimal form is &#N; (e.g. &#8364; for €) and the hexadecimal form is &#xN; (e.g. &#x20AC;for the same character). NCRs work everywhere because they reference the code point directly rather than depending on the document's byte encoding.

Which method should I use — direct character, named entity, or numeric reference?

In a modern UTF-8 HTML document: use direct characters for readability, named entities (&amp;, &lt;, &gt;) for reserved characters, and numeric character references for any character that has no named entity — such as ✓ (&#10003;) or emoji.

Related Tools & Guides