Why MD5 and SHA-1 Are Broken — and What "Broken" Actually Means
MD5 was designed by Ron Rivest in 1991. SHA-1 was designed by the NSA in 1995. Both are general-purpose cryptographic hash functions, both produce a fixed-size digest, and both have been considered "broken" for years. But "broken" needs definition because the word covers two very different attacks.
A collision attack means an adversary can find two different inputs A and B that produce the same hash. A second-preimage attack means an adversary, given a fixed input A, can find a different B that hashes to the same value. The second is much harder. MD5 has been collision-broken since 2004, with practical collisions taking minutes on a laptop today. SHA-1's first practical collision was demonstrated by Google's "SHAttered" attack in February 2017 — they produced two distinct PDF files with identical SHA-1 hashes, costing about 6,500 CPU-years and $110,000 of GPU compute on AWS, since reduced to roughly $45,000 with the 2020 "Shambles" paper. Neither MD5 nor SHA-1 has a practical second-preimage attack, but the gap closes every year and "we cannot give an adversary a fixed target" is a fragile property to rely on.
The practical impact: stop using MD5 or SHA-1 for any security purpose. The real-world incidents are not theoretical. The Flame malware (2012, attributed to a state actor) abused an MD5 collision in the Microsoft Terminal Services certificate signing chain to forge a valid Windows code-signing certificate. Git used SHA-1 for content addressing until the 2.13 transition plan, and the SHAttered demo was framed exactly to show that two different commits could share a hash — Git introduced collision detection in 2017 and is migrating to SHA-256. The CA/Browser Forum forced TLS certificates off SHA-1 in 2017. PGP, S/MIME, and most code-signing has followed.
What is still acceptable for MD5 and SHA-1? Non-adversarial integrity checking. If you are computing a hash to detect random corruption (a CDN file vs the origin, a downloaded artifact vs a published checksum) and there is no incentive for an attacker to forge a collision, MD5 and SHA-1 still work. They are also still appropriate as fingerprints inside data structures where collision resistance is not the security model — Git's SHA-1 in 2026 mostly survives because Git's threat model assumes the producer is trusted and content addressing is convenience.
Modern recommended choices: SHA-256 or SHA-3-256 for any new system, with BLAKE3 as the fast modern alternative when you need throughput. SHA-256 has stood since 2001 with no meaningful weakening; the cryptographic community is comfortable putting another 20+ years of trust on it. SHA-3 (Keccak, standardized 2015) is structurally different from SHA-2, so even if a flaw is found in SHA-2, SHA-3 is unlikely to share it. BLAKE3 is roughly 5x faster than SHA-256 in software, has the same security level, and is a great fit for very large files.
// MD5 collision in seconds (e.g., HashClash, fastcoll)
fastcoll input.bin a.bin b.bin
md5sum a.bin b.bin
# 008ee33a9d58b51cfeb425b0959121c9 a.bin
# 008ee33a9d58b51cfeb425b0959121c9 b.bin <-- different files, same MD5
// Modern hashing in Node 20+
import { createHash } from 'node:crypto';
createHash('sha256').update(buf).digest('hex'); // recommended
createHash('sha3-256').update(buf).digest('hex'); // SHA-3 (Keccak)
createHash('blake2b512').update(buf).digest('hex'); // BLAKE2
// BLAKE3 — install @noble/hashes for portability
import { blake3 } from '@noble/hashes/blake3';
blake3(buf, { dkLen: 32 }); // 32-byte digest
// Decision:
// - Adversary involved (signatures, content auth)? → SHA-256 / SHA-3 / BLAKE3
// - Just detecting random corruption (CRC-style)? → MD5/SHA-1 still OK
// - Speed matters on huge files (TB-scale)? → BLAKE3
Argon2id vs scrypt vs bcrypt — Modern Password Hashing
For passwords, plain SHA-256 is wrong. SHA-256 is designed to be fast — billions of hashes per second on a GPU. The whole point of password hashing is to be slow enough that a stolen database is useless to brute-force. Three families have dominated since the 2000s, and OWASP's 2026 password storage cheat sheet ranks them in this order:
Argon2id (2015 winner of the Password Hashing Competition; standardized in RFC 9106, 2021). Three knobs: memory cost (m, in KiB), time cost (t, iterations), parallelism (p, threads). The "id" variant blends data-dependent and data-independent passes for resistance to both side-channel and time-memory tradeoff attacks. OWASP's 2026 baseline: m=19456 (19 MiB), t=2, p=1, OR m=12288, t=3, p=1, OR m=7168, t=5, p=1 — pick whichever your servers can sustain at peak login load. Argon2id is the unambiguously preferred choice for new systems. Major libraries: argon2 (Node), argon2-cffi (Python), org.bouncycastle.crypto.generators.Argon2BytesGenerator (Java).
scrypt (2009, Colin Percival). Two knobs: N (CPU/memory cost factor, must be a power of 2), r (block size), p (parallelization). Recommended (2026): N = 2^17 (≈128 MiB), r=8, p=1. scrypt is memory-hard but does not have Argon2's resistance properties. It remains acceptable if you are stuck with it for compatibility.
bcrypt (1999, Niels Provos). One knob: cost factor (work factor; doubles cost per increment). Recommended (2026): cost = 12 or 13, depending on your hardware budget — anything below 10 is too fast in 2026. The flaw of bcrypt is that it has a 72-byte input length cap (longer passwords are silently truncated), uses no significant memory, and has no resistance to specialized hardware. It is fine for legacy systems but not what you would pick today. PBKDF2 is even older and even weaker — only acceptable when FIPS compliance forces it.
A critical implementation note shared by all three: never write your own bcrypt comparison. The verify function uses constant-time comparison to avoid timing attacks. A naive == on the hex-encoded digests leaks information about how many leading characters matched. Use argon2.verify(hash, password) or bcrypt.compare(password, hash) and never the equality operator on the hashes themselves.
Migration strategy when moving to a stronger algorithm: do not force-rehash everyone at once. Instead, on next login, after verifying with the old algorithm, rehash with the new one and store. This is the "lazy migration" pattern and is exactly what Stack Exchange did when moving from MD5 to bcrypt to Argon2 over a decade.
// Argon2id with OWASP 2026 baseline
import argon2 from 'argon2';
const hash = await argon2.hash(password, {
type: argon2.argon2id,
memoryCost: 19456, // 19 MiB
timeCost: 2,
parallelism: 1,
});
// Stored format embeds params:
// $argon2id$v=19$m=19456,t=2,p=1$<salt>$<hash>
const ok = await argon2.verify(hash, password); // constant-time
// Lazy migration on login
async function login(email, password) {
const user = await db.users.findOne({ email });
if (!user) return null;
if (user.hash.startsWith('$argon2')) {
if (!await argon2.verify(user.hash, password)) return null;
} else if (user.hash.startsWith('$2')) { // bcrypt
if (!await bcrypt.compare(password, user.hash)) return null;
// upgrade!
const newHash = await argon2.hash(password, ARGON2_OPTS);
await db.users.update({ id: user.id }, { hash: newHash });
}
return user;
}
Cryptographic vs Non-Cryptographic Hashes — Use the Right Class
Hash functions split into two families with very different design goals, and using one where the other belongs is a common bug.
Cryptographic hashes (SHA-256, SHA-3, BLAKE3, BLAKE2) prioritize collision resistance, preimage resistance, and second-preimage resistance against adversarial inputs. They are deliberately not the fastest possible. They are appropriate any time the input might be controlled by a malicious party — content addressing, signatures, hashes used as identifiers in security-relevant contexts, password hashing (with the special functions above), HMAC, key derivation. SHA-256 on modern x86 with hardware SHA extensions runs around 1.5–2 GB/s; BLAKE3 runs 5–10 GB/s; SHA-3 a bit slower than SHA-2.
Non-cryptographic hashes (xxHash, MurmurHash, CityHash, FNV, FarmHash) prioritize raw speed and good statistical distribution. They make NO guarantee against adversarial inputs — given xxHash's design, an attacker can construct inputs that all collide on the same bucket. xxHash is roughly 30+ GB/s on modern CPUs (10x SHA-256), and that is the entire reason it exists. Use these for: hash tables, Bloom filters, dedup keys for non-adversarial data, integrity checks where you only worry about random corruption (a network packet, a memory bit flip), bucketing logs by client.
The bug pattern: using a non-cryptographic hash in an adversarial context. The classic 2003 "Crosby & Wallach" paper showed that early Perl, Java, and PHP hashtables used such hashes for dictionary keys, and a remote attacker who could submit form parameters could craft inputs that all collided, turning O(1) lookup into O(n) and DoSing the server. The fix that languages adopted (the SipHash algorithm with a per-process random key) is itself a "keyed cryptographic hash" — fast like a non-crypto hash, but secure when the key is unknown to the attacker.
The reverse bug — using SHA-256 for a hash table — is just slow. It works correctly, but you waste CPU on cryptographic guarantees that the application does not need. A 100x speed difference matters in inner loops and in language runtimes.
Quick decision tree: untrusted input AND collision matters for correctness or security → SHA-256 / SHA-3 / BLAKE3. Trusted or low-stakes random data AND speed matters → xxHash / MurmurHash. Hash table key for arbitrary user input → SipHash (the language usually picks this for you in Python 3.4+, Ruby, Rust). Password → Argon2id.
// CRYPTOGRAPHIC — adversary-resistant
import { createHash } from 'node:crypto';
createHash('sha256').update(buf).digest('hex');
// ~1.5 GB/s on modern x86
// NON-CRYPTOGRAPHIC — fast, NOT for security
import xxhash from 'xxhash-wasm';
const { h64 } = await xxhash();
h64(buf); // ~30+ GB/s, 64-bit digest
// Use for: hashtables, dedup, sharding non-adversarial data
// HASH TABLE BUG — never use plain non-crypto hash on attacker input
class BadCache {
constructor() { this.buckets = new Array(1024); }
set(k, v) {
const i = murmur(k) % 1024; // attacker can force all to same bucket
(this.buckets[i] ||= []).push([k, v]);
}
}
// CORRECT — Map / Object use the language's keyed/hardened hasher
const cache = new Map();
cache.set(userInput, value); // V8 uses random-keyed hashing
// Non-adversarial integrity (e.g., copy across rack)
const a = h64(file);
const b = h64(receivedFile);
if (a !== b) console.log('corrupted in transit');
Last updated:
About this tool
A hash generator produces a fixed-length fingerprint of any input using cryptographic algorithms like MD5, SHA-1, SHA-256, and SHA-512. Hashes are one-way: you cannot reverse a hash back to the original input. Developers use them to verify file integrity, deduplicate content, generate cache keys, and (with proper salting) store password verifiers.
How to use
Type or paste your input — text, JSON, or any string — into the input box.
All four algorithms produce a hash simultaneously below.
Click Copy on any row to put that hash on your clipboard.
Compare the hash against a published checksum to verify a download.
Use SHA-256 or SHA-512 for anything security-sensitive; treat MD5 and SHA-1 as legacy only.
Common use cases
Verifying that a downloaded ISO or installer matches the publisher’s checksum.
Generating cache keys (ETags) for HTTP responses.
Producing deterministic IDs for content-addressed storage like IPFS or git objects.
Checking whether two large blobs are identical without comparing them byte by byte.
Building a Merkle tree where each node hashes its children.
Computing HMAC inputs (combine with a key for authenticated message tags).
Frequently asked questions
Q. Should I use MD5 for passwords?
A. No. MD5 and SHA-1 are broken for cryptographic use. For passwords use a slow KDF like Argon2id, bcrypt, or scrypt — not raw hashes.
Q. Why does the same input always produce the same hash?
A. That is the defining property of a hash function. It is what makes hashes useful for integrity checks and content addressing.
Q. Can two different inputs produce the same hash?
A. Yes — this is called a collision. SHA-256 and SHA-512 have no known practical collisions; MD5 and SHA-1 do.
Q. Are hashes encryption?
A. No. Hashes are one-way and have no key. Encryption is two-way (with a key, you can decrypt). Use AES or ChaCha20 for encryption.