Base64 Encoder / Decoder

Why Base64 Always Inflates Data by Exactly 33% (The Math)

Base64 takes binary data — a stream of 8-bit bytes — and re-encodes it as text using only 64 distinct characters: A–Z, a–z, 0–9, plus two extras (typically + and /). The choice of 64 is mathematically deliberate: 64 is exactly 2^6, so each output character represents 6 bits of input. The encoder takes the source bitstream and re-groups it into 6-bit chunks, then maps each chunk to one of the 64 alphabet characters. The size overhead falls out of arithmetic. Three input bytes hold 24 bits, which divides evenly into four 6-bit groups. So every group of 3 input bytes becomes 4 output characters — a 4/3 ratio, or +33.33% size. If the input length is not a multiple of 3, the encoder pads the final group with zero bits and appends one or two = characters so the receiver knows how many real bytes were in the last quartet. One = means the last quartet encodes 2 source bytes; two == means it encodes 1 source byte; no = means it was a clean 3-byte group. This explains a few common observations: a base64 string's length is always a multiple of 4. The padding is functionally optional in many implementations (decoders can deduce it from the modular length), which is why you see "padding-stripped" Base64 in JWTs and modern URL-safe contexts. Strict RFC 4648 compliance requires the padding be present; lots of real software is lenient on input but strict on output. The 33% number is asymptotic. For very small inputs the overhead is higher because of padding rounding — encoding 1 byte produces 4 characters, a 300% blow-up for that byte. For inputs of a few hundred bytes or more, the overhead asymptotically approaches 33.33%. There is no scheme that does better while staying inside printable ASCII; Base85 (used in Adobe PDF) reaches 25% overhead but uses characters like backslash and quote that would break in JSON or URLs. Base64 hits the practical sweet spot of compactness vs portability.
Input bytes:    M       a       n
                |_______|_______|_______|
Binary:        01001101 01100001 01101110
Re-grouped:    010011 010110 000101 101110
Lookup A-Z..:  T      W       F      u
Output:        "TWFu"

// 1 byte: pads with ==
btoa("M");   // "TQ=="

// 2 bytes: pads with =
btoa("Ma");  // "TWE="

// 3 bytes: clean
btoa("Man"); // "TWFu"

// length always % 4 === 0
btoa("hello world").length;  // 16
btoa("a").length;            // 4 (was 1 byte → 4 chars)

URL-safe Base64 vs Standard Base64, and the btoa Trap

RFC 4648 defines two alphabets. Standard Base64 uses + and / for the last two characters and = for padding. URL-safe Base64 (sometimes "Base64URL") replaces + with - and / with _, and typically strips the trailing = padding. The reason is simple: +, /, and = all have reserved meaning in URLs, in HTTP form bodies, and in some filesystems. A standard Base64 string dropped into a query parameter without escaping breaks at the first +, which a server then decodes back to a literal space. This is why JWTs use Base64URL exclusively for header / payload / signature segments. The token has to ride inside HTTP Authorization headers, cookies, and URLs, all of which interact badly with the standard alphabet. Use the wrong dialect and you get either a parse error on the receiving end or, worse, a token that "almost works" because the server happens to canonicalize + to space in some code paths but not others. The browser's btoa / atob predate URL-safe Base64 and produce only the standard alphabet. Worse, they accept only Latin-1 input — passing a string with any character above U+00FF (any non-ASCII / non-Latin-1 character, including most CJK and emoji) throws InvalidCharacterError. The correct path in 2026 is to encode strings to UTF-8 bytes first via TextEncoder, then Base64-encode the bytes, then optionally translate to URL-safe and strip padding. Reverse on the way out. Node.js handles this more cleanly: Buffer.from(str, 'utf-8').toString('base64url') produces URL-safe Base64 directly, no padding, no charset surprises. In browsers, the proposed Uint8Array.prototype.toBase64() / fromBase64() — shipping in TC39 stage 3 as of 2026 — finally gives a clean built-in. Until it lands in your minimum supported browser, the TextEncoder + btoa + replace dance below is the canonical workaround. A quick safety note: never assume Base64 is "encoded" in the security sense. It is not encryption and not even meaningful obfuscation. Anyone with five seconds and any tool can recover the bytes. Use it for transport, never as a substitute for actual cryptography.
// Browser: UTF-8 safe Base64URL (no padding)
function toBase64Url(str) {
  const bytes = new TextEncoder().encode(str);
  let bin = '';
  for (const b of bytes) bin += String.fromCharCode(b);
  return btoa(bin).replace(/\+/g, '-').replace(/\//g, '_').replace(/=+$/, '');
}

function fromBase64Url(s) {
  s = s.replace(/-/g, '+').replace(/_/g, '/');
  while (s.length % 4) s += '=';
  const bin = atob(s);
  const bytes = new Uint8Array(bin.length);
  for (let i = 0; i < bin.length; i++) bytes[i] = bin.charCodeAt(i);
  return new TextDecoder().decode(bytes);
}

// btoa breaks on non-Latin-1
btoa('한글');                  // InvalidCharacterError
toBase64Url('한글');           // "7ZWc6riA"  (works)

// Node 16+
Buffer.from('한글', 'utf-8').toString('base64url'); // "7ZWc6riA"

When NOT to Use Base64 — Three Real Cases

Base64 has a niche, and developers regularly drag it outside that niche to their detriment. Three places it commonly does more harm than good: (1) Inlining images via data URLs. It is tempting to embed a 4 KB icon as <img src="data:image/png;base64,...">. The byte cost is +33%, but the real damage is cache invalidation. Every separate inlined asset is part of whatever page bundle holds it, so a single icon update busts the cache for the entire HTML document or CSS file it lives in. Worse, the browser cannot share that icon between pages — each page re-downloads the bytes inline. The crossover point is around 1–2 KB on HTTP/2 and HTTP/3, where the per-request overhead of an external file is small enough that inlining loses on every metric. Reach for inlining only for tiny SVGs / data URIs in CSS that are guaranteed not to change and are guaranteed to appear on the same critical-path page. (2) "Encoding" structured data into a Base64 blob to "save space." Base64 makes data 33% larger, not smaller. If you find yourself JSON.stringify-then-Base64-then-Gzip, drop the middle step — gzip works on raw text fine, and the Base64 layer just gives gzip more redundant patterns to compress, costing you both encoding time and a worse compression ratio than gzipping the raw JSON. Base64 is for moving binary across a text-only channel; if your channel is already text, you do not need it. (3) Storing JWT or session tokens in URLs that get logged. The Base64URL alphabet is URL-safe but it is not log-safe in any privacy sense. Web servers, proxies, and CDNs default to logging full request URLs. A "logout link" that stuffs a session token into ?token= will leak that token into access logs forever, and the Base64 wrapper does nothing to protect it. Either accept that the value is public (one-time-use, short-lived) or move it to an HTTP body, an HttpOnly cookie, or an Authorization header that proxies are configured to redact. The throughline: Base64 solves a transport problem, not a storage, security, or compression problem. Use it precisely when you have a binary payload and a text-only channel, and not anywhere else.
// BAD: data URL for a logo on every page
<img src="data:image/png;base64,iVBORw0KGgoAAAA...(20KB)..." />
// → busts HTML cache on every change, no cross-page reuse

// GOOD: external file with long cache TTL
<img src="/static/logo.v3.png" />
// → CDN caches once, reused everywhere

// BAD: redundant Base64 in a compression pipeline
const blob = btoa(JSON.stringify(data));
const compressed = gzip(blob);  // gzip is now LESS effective

// GOOD: gzip the JSON directly
const compressed = gzip(JSON.stringify(data));

// BAD: token in URL → leaked to access logs
<a href="/logout?token=eyJhbGciOi...">

// GOOD: Authorization header
fetch('/logout', { method: 'POST', headers: { Authorization: 'Bearer ' + token } });
Last updated:

About this tool

Base64 is a binary-to-text encoding scheme that represents binary data using a 64-character ASCII alphabet. It is widely used to transport binary content over text-only channels like email (MIME), HTTP headers, JSON payloads, and data URLs in HTML and CSS. Encoding adds about 33% size overhead but guarantees the data survives any text-safe transport.

How to use

  1. Choose Encode to convert plain text into Base64, or Decode to recover the original text from Base64.
  2. Paste your input — UTF-8 text, JSON, or any string — into the input box.
  3. The output updates automatically as you type; no submit button required.
  4. Click the swap arrow to flip input and output and toggle the mode for round-trip checks.
  5. Press Copy to grab the result and paste it into your code, terminal, or HTTP client.

Common use cases

  • Embedding small images directly in CSS or HTML using data: URLs to skip an extra HTTP request.
  • Encoding HTTP Basic Auth credentials in the Authorization header.
  • Storing or transmitting binary file content inside JSON payloads.
  • Encoding email attachments in MIME messages so SMTP relays do not corrupt them.
  • Inspecting JWT segments, since the header and payload are Base64URL-encoded JSON.
  • Sending binary data through systems that strip or mangle non-printable characters.

Frequently asked questions

Q. Is Base64 encryption?

A. No. Base64 is purely an encoding — anyone can decode it instantly. Never use it to hide secrets; use AES or another cipher for that.

Q. Why does my Base64 string have = at the end?

A. The = characters are padding so the output length is a multiple of 4. Implementations may omit padding (URL-safe Base64) but standard Base64 keeps it.

Q. What is Base64URL and how is it different?

A. Base64URL replaces + with - and / with _ so the output is safe inside URLs and filenames. JWTs use this variant.

Q. How much does Base64 inflate my data?

A. Roughly 33% larger than the original binary, since every 3 bytes become 4 ASCII characters.