Base64 is one of those things developers encounter constantly — in JWT tokens, CSS data URIs, HTTP Basic Auth, email attachments, and API responses — yet few people have actually sat down and understood how it works at the bit level. This guide covers everything from the underlying algorithm to real-world usage patterns and the mistakes that trip people up most often.
What Is Base64?
Base64 is a binary-to-text encoding scheme. It takes arbitrary binary data — bytes that might include any value from 0 to 255 — and represents that data using only 64 specific printable ASCII characters. Those 64 characters are: A–Z (26), a–z (26), 0–9 (10), plus + and /. That's where the "64" in the name comes from.
The reason this exists is historical. Many protocols and systems were designed for text — things like SMTP for email, HTTP headers, HTML attributes, and XML documents. These systems either couldn't handle raw binary bytes, or they assigned special meaning to certain byte values that made binary data corrupted or unreadable in transit. Base64 sidesteps the problem by converting everything to characters those systems can handle safely.
Base64 is formally defined in RFC 4648 (2006), which covers both the standard alphabet (+/) and the URL-safe alphabet (-_). The older RFC 2045 defined Base64 for MIME email specifically.
How the Algorithm Works
Understanding the actual bit-manipulation behind Base64 makes everything else — including the 33% size overhead and the padding characters — immediately obvious.
Step 1: Take 3 bytes of input
Base64 processes input in 3-byte (24-bit) groups. Each byte is 8 bits, so 3 bytes gives us 24 bits total.
"Man" → bytes 77, 97, 110010011 010110 000101 101110Step 2: Split into four 6-bit groups
Those 24 bits are re-grouped into four chunks of 6 bits each. 6 bits can represent values 0–63 — exactly 64 possible values, one for each character in the Base64 alphabet.
Step 3: Look up each 6-bit value in the alphabet
Each 6-bit number maps to a character in the Base64 alphabet:
Why Base64 Adds 33% to File Size
The size increase follows directly from the algorithm. Every 3 bytes of input produces 4 output characters. Since each ASCII character is 1 byte, you're storing 3 bytes of data using 4 bytes of text — a ratio of 4/3 = 1.333, or ~33% overhead.
| Original size | Base64 size | Overhead |
|---|---|---|
| 1 KB | ~1.37 KB | +370 B |
| 10 KB | ~13.7 KB | +3.7 KB |
| 100 KB PNG | ~137 KB | +37 KB |
| 1 MB image | ~1.37 MB | +370 KB |
For small files — icons, small logos, inline SVGs — the overhead is acceptable and you save an HTTP request. For anything above roughly 10–20 KB, a separate file request with proper HTTP caching is more efficient.
Padding Characters: What = Means
Base64 processes input 3 bytes at a time. When the input length isn't a multiple of 3, the last group has 1 or 2 bytes instead of 3. Padding characters (=) fill the gap so the output length is always a multiple of 4.
"M" → "TQ==" (1 byte → 2 chars + padding)
"Ma" → "TWE=" (2 bytes → 3 chars + padding)
Decoders use the number of = characters to figure out how many bytes to expect at the end. Some implementations (URL-safe Base64 in particular) omit padding entirely, since the decoder can calculate it from the output length.
Standard vs URL-Safe Base64
Standard Base64 uses + for value 62 and / for value 63. Both characters carry special meaning in URLs: + is interpreted as a space in query strings, and / separates URL path segments. Placing a standard Base64 string in a URL either corrupts it or requires percent-encoding (%2B, %2F).
URL-safe Base64 (RFC 4648 §5) solves this by substituting: + → - and / → _. Padding is typically stripped too.
| Feature | Standard Base64 | URL-safe Base64 |
|---|---|---|
| Value 62 | + | - |
| Value 63 | / | _ |
| Padding | = | Often omitted |
| Safe in URLs | No — needs escaping | Yes |
| Safe in filenames | No — / breaks paths | Yes |
| Used in | MIME email, data URIs | JWT, OAuth, cookies, filenames |
If the Base64 string goes in a URL, a cookie, a filename, or an HTTP header that doesn't allow + and /, use URL-safe. If it goes in a data URI or MIME email body, use standard.
Base64 in JavaScript
Every browser and Node.js environment provides btoa() and atob() for encoding and decoding:
// Encode
btoa("Hello") // → "SGVsbG8="
btoa("Man") // → "TWFu"
// Decode
atob("SGVsbG8=") // → "Hello"
atob("TWFu") // → "Man"The catch: btoa() only accepts characters in the Latin-1 range (code points 0–255). Any character above that — emoji, Chinese, Arabic, accented letters like ü — throws DOMException: Failed to execute 'btoa': The string to be encoded contains characters outside of the Latin1 range.
// Encode Unicode (encode to UTF-8 bytes first)
function b64Encode(str) {
return btoa(unescape(encodeURIComponent(str)));
}
// Decode Unicode
function b64Decode(str) {
return decodeURIComponent(escape(atob(str)));
}
b64Encode("Hello 🌍") // → "SGVsbG8g8J+MjQ=="
b64Decode("SGVsbG8g8J+MjQ==") // → "Hello 🌍"function b64UrlEncode(str) {
return b64Encode(str)
.replace(/\+/g, '-')
.replace(/\//g, '_')
.replace(/=+$/, ''); // strip padding
}
function b64UrlDecode(str) {
// Re-add padding
let s = str.replace(/-/g, '+').replace(/_/g, '/');
while (s.length % 4) s += '=';
return b64Decode(s);
}Base64 in Python
import base64
# Standard encode/decode
encoded = base64.b64encode(b"Hello")
# → b"SGVsbG8="
decoded = base64.b64decode("SGVsbG8=")
# → b"Hello"
# URL-safe variant
url_enc = base64.urlsafe_b64encode(b"Hello")
# → b"SGVsbG8=" (same here, differs when +/present)
# Encode a string (not bytes)
s = "Hello 🌍"
encoded = base64.b64encode(s.encode('utf-8'))
# Decode back to string
decoded = base64.b64decode(encoded).decode('utf-8')
# → "Hello 🌍"Data URIs: Embedding Files in HTML and CSS
A data URI embeds a file's complete contents as a Base64 string directly in an attribute or stylesheet. The format is:
data:[mediatype][;base64],[data]
// Example — small PNG in an img tag
<img src="data:image/png;base64,iVBORw0KGgo..." />
// Example — font in CSS
@font-face {
font-family: 'MyFont';
src: url('data:font/woff2;base64,d09GMgAB...');
}When data URIs make sense
- Email templates — email clients often block external image requests; embedding is the only reliable way to display images
- Single-file HTML — self-contained pages that must include assets without external files
- Small icons — tiny SVG icons or 1×1 pixel tracking images where the HTTP request overhead is larger than the image
- Critical above-the-fold images — embedding a tiny hero thumbnail to avoid render-blocking
When they don't
- Large images — the 33% size increase and no HTTP caching means users re-download the full embedded image on every page load
- Images shared across multiple pages — a referenced file is cached once; an embedded one is duplicated in each page's HTML
- CSS background images for frequently visited pages — separate files with long cache headers are significantly faster
Base64 and JWTs
JSON Web Tokens are one of the most common places developers encounter Base64 in practice. A JWT has three URL-safe Base64-encoded parts separated by dots:
eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9
.eyJzdWIiOiJ1c2VyXzEyMyIsIm5hbWUiOiJBbGljZSIsImlhdCI6MTcxOTQwMDAwMH0
.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c
Decoded header: {"alg":"HS256","typ":"JWT"}
Decoded payload: {"sub":"user_123","name":"Alice","iat":1719400000}
Signature: HMAC-SHA256 of header + "." + payloadThe header and payload are plain JSON encoded with URL-safe Base64 (no padding). They are not encrypted. Anyone can decode them with atob() or any Base64 decoder. The signature is what proves the token hasn't been tampered with — it requires the secret key to verify.
Never store secrets, passwords, or sensitive PII in a JWT payload assuming the Base64 encoding "hides" it. The payload is fully readable by anyone who has the token. Use JWE (JSON Web Encryption) if the payload itself needs to be confidential.
HTTP Basic Authentication
HTTP Basic Auth sends credentials in the Authorization header. The format is the username and password joined with a colon, then Base64-encoded:
// Credentials: alice:mysecretpassword
btoa("alice:mysecretpassword")
// → "YWxpY2U6bXlzZWNyZXRwYXNzd29yZA=="
// HTTP header
Authorization: Basic YWxpY2U6bXlzZWNyZXRwYXNzd29yZA==
// Anyone who intercepts this header can run:
atob("YWxpY2U6bXlzZWNyZXRwYXNzd29yZA==")
// → "alice:mysecretpassword"As the code above shows, Basic Auth credentials are trivially reversible from the header value. This is why HTTP Basic Auth must only be used over HTTPS. Without TLS, the credentials are exposed to anyone on the network.
Base64 Is Not Security
This is the most common misconception. Base64 is purely a representation change — it moves the same bits around into a different character set. It adds zero confidentiality, zero integrity protection, and zero authentication.
Here's the full picture:
| Property | Base64 | Encryption (AES-256) | Hashing (bcrypt) |
|---|---|---|---|
| Reversible | Yes — by anyone | Yes — with key only | No |
| Hides content | No | Yes | Yes (one-way) |
| Requires key | No | Yes | Yes (salt) |
| Use for passwords | Never | No | Yes |
| Primary purpose | Binary → text transport | Confidentiality | Verification |
Storing passwords as Base64 in a database is as insecure as storing them in plain text. Any attacker who reads the database can decode every password instantly. Use bcrypt, scrypt, or Argon2 for passwords.
5 Common Base64 Mistakes
- Using
btoa()with Unicode input.btoa()only accepts Latin-1 characters. Strings with emoji or non-Latin characters throw exceptions. Usebtoa(unescape(encodeURIComponent(str)))instead. - Forgetting URL-safe substitutions. Standard Base64 in a URL or cookie silently corrupts when the browser interprets
+as a space. Always use the-_variant in URLs. - Treating Base64 as encryption. Covered above — it provides no security. Don't use it to "obscure" sensitive values.
- Embedding large files as data URIs. Any image over ~10 KB is better served as a separate file with HTTP caching. Embedding bypasses the browser cache and forces a re-download every page load.
- Forgetting to re-add padding when decoding URL-safe Base64. URL-safe Base64 often strips the trailing
=characters. When decoding, add them back: pad untilstr.length % 4 === 0.
Base64 vs Hex Encoding
Hex encoding is another common way to represent binary data as text. Instead of 64 characters, it uses 16 (0–9 and a–f), representing each byte as two hex digits.
| Property | Base64 | Hex |
|---|---|---|
| Characters used | 64 (A–Z, a–z, 0–9, +, /) | 16 (0–9, a–f) |
| Size overhead | ~33% | ~100% (doubles) |
| Readability | Opaque | Easier to inspect bytes |
| Common uses | Files, tokens, data URIs, JWT | Cryptographic digests, color codes, memory addresses |
| URL-safe variant | Yes (RFC 4648) | Yes — no special chars |
Choose Base64 when compactness matters (API tokens, data URIs). Choose hex when human readability of individual bytes matters (cryptographic hashes, debugging binary protocols).
Quick Reference
// Unicode-safe encode
const encode = str => btoa(unescape(encodeURIComponent(str)));
// Unicode-safe decode
const decode = str => decodeURIComponent(escape(atob(str)));
// URL-safe encode (no padding)
const urlEncode = str => encode(str)
.replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
// URL-safe decode
const urlDecode = str => {
let s = str.replace(/-/g, '+').replace(/_/g, '/');
while (s.length % 4) s += '=';
return decode(s);
};// Encode
Buffer.from('Hello 🌍').toString('base64');
// → "SGVsbG8g8J+MjQ=="
// Decode
Buffer.from('SGVsbG8g8J+MjQ==', 'base64').toString('utf8');
// → "Hello 🌍"
// URL-safe (Node 16+)
Buffer.from('Hello').toString('base64url');
// → "SGVsbG8"