SHA-256 vs MD5: Why Hash Algorithms Matter

Why this comparison still matters

Cryptographic hash functions are the quiet backbone of digital security — they protect passwords, verify file integrity, sign certificates, and anchor blockchains. MD5 and SHA-256 are the two hash algorithms developers encounter most often, and the gap between them is the gap between 'broken' and 'secure.' MD5 has been cryptographically broken since 2004, yet it still appears in checksums, legacy systems, and (alarmingly) in production password storage. This article explains exactly what each algorithm does, why MD5 is broken, and where each is still appropriate to use.

SHA-256: the modern standard

SHA-256 is part of the SHA-2 family, specified in FIPS 180-4 and described in RFC 6234. It was designed by the NSA and published in 2001 as the successor to SHA-1. It produces a 256-bit (32-byte, 64-character hex) digest from any input of any length. The algorithm processes input in 512-bit blocks through 64 rounds of compression, mixing, and rotation.

SHA-256's security properties are what you want from a cryptographic hash: it is preimage-resistant (given a hash, you cannot feasibly find an input that produces it), second-preimage-resistant (given an input, you cannot find a different input with the same hash), and collision-resistant (you cannot find any two inputs with the same hash). No practical collision attack against SHA-256 has ever been demonstrated, and the best theoretical attacks remain far beyond available compute. SHA-256 underpins TLS certificate signing, Bitcoin's proof-of-work, Git's object hashing (since Git 2.29+ optionally), PGP key fingerprints, and most modern code-signing pipelines.

Performance is more than adequate for most uses. A modern CPU with SHA-NI extensions (Intel, AMD, Apple Silicon) hashes several gigabytes per second. Without hardware acceleration, pure-software SHA-256 still handles hundreds of megabytes per second — fast enough that hashing time is rarely the bottleneck. For password storage specifically, SHA-256 is too fast — but that's a different problem (see below).

MD5: fast, ubiquitous, broken

MD5 was designed by Ron Rivest and published as RFC 1321 in 1992. It produces a 128-bit (16-byte, 32-character hex) digest, processing input in 512-bit blocks through 64 rounds. For a decade it was the standard hash for everything from password storage to file integrity to TLS certificates.

Then it broke. In 2004, Xiaoyun Wang and colleagues published a collision attack that found two different inputs producing the same MD5 hash in seconds on a single PC. By 2008, researchers had used MD5 collisions to forge a rogue CA certificate — the theoretical 'break' became a practical attack against real-world PKI. Flame malware (2012) used a forged MD5-signed certificate to impersonate Microsoft Terminal Server licensing. CMU's Software Engineering Institute formally declared MD5 'cryptographically broken and unsuitable for further use' in 1996, after earlier weaknesses were found, and reaffirmed this after 2004.

So why is MD5 still everywhere? Three reasons. First, it is fast and simple to implement — a few hundred lines of code, no external dependencies, runs anywhere. Second, it is sufficient for non-adversarial integrity checks: if you want to verify that a downloaded file matches the publisher's published checksum and your only threat model is accidental corruption (a network glitch, a faulty disk), MD5 detects that perfectly well. Third, legacy systems still depend on it — many file formats, protocol specifications, and old databases encode MD5 hashes that would require migrations to replace.

The danger is when MD5 is used in adversarial contexts. Storing passwords as MD5 hashes is catastrophic: modern GPUs compute billions of MD5 hashes per second, so an 8-character password falls in seconds. Using MD5 for file integrity where an attacker can substitute a colliding file defeats the purpose. Using MD5 for digital signatures is what enabled the Flame attack.

Why hashing is not password storage

A common confusion is treating hash functions as password storage. Plain SHA-256 of a password is trivially reversible for any common password — a single modern GPU (NVIDIA RTX 4090) computes roughly 5+ billion SHA-256 hashes per second via hashcat, so an 8-character password drawn from printable ASCII (95 characters) falls in under 4 hours, and a dictionary of the top 10 million passwords falls in milliseconds. Adding a salt (a random per-user value stored alongside the hash) defeats precomputed rainbow tables but does not slow down targeted brute force — the salt is public, and the attacker simply brute-forces each hash individually at full GPU speed.

The right answer is a key derivation function designed to be slow and memory-hard. bcrypt (1999) is still a solid choice — its work factor (cost) is tunable, and you should set it to the highest value your hardware tolerates for login latency (typically cost 12–14 as of 2025, meaning 2^12 to 2^14 iterations, roughly 250ms–1s per hash on commodity hardware). scrypt (2009) adds memory hardness, making GPU and ASIC attacks more expensive by requiring significant RAM per hash. Argon2id (winner of the 2015 Password Hashing Competition, specified in RFC 9106) is the current OWASP recommendation — it's both memory-hard and side-channel resistant, with three tunable parameters (memory, iterations, parallelism). The OWASP-recommended baseline as of 2025 is Argon2id with 19 MiB memory, 2 iterations, and 1 parallelism thread.

Real-world incidents make the stakes concrete. The 2012 LinkedIn breach exposed 6.5 million passwords hashed with unsalted SHA-1 — by the time the dump leaked publicly, researchers had cracked over 90% within days using commodity GPU hardware. The 2013 Adobe breach used 3DES encryption (not even hashing) with the same key for all users, exposing 153 million credentials and the password hints that made cracking trivial. The Ashley Madison breach (2015) used bcrypt with a low work factor for some users and MD5 for others — the MD5 hashes fell immediately. In every case, the breach was catastrophic because the password protection was inadequate by modern standards.

Length-extension attacks are another SHA-2 family concern. SHA-256 (like MD5 and SHA-1) uses the Merkle-Damgard construction, which is vulnerable to length extension: given H(secret || message), an attacker can compute H(secret || message || padding || extension) without knowing the secret. The defense is HMAC (HMAC-SHA-256, RFC 2104), which wraps the hash in a construction that defeats length extension. Never use H(secret || message) directly for authentication — always use HMAC.

Side-by-side comparison

Dimension	SHA-256	MD5
Spec	FIPS 180-4, RFC 6234	RFC 1321 (1992)
Output size	256 bits (32 bytes, 64 hex)	128 bits (16 bytes, 32 hex)
Block size	512 bits	512 bits
Rounds	64	64
Design date	2001	1991
Collision resistance	Yes (no practical attacks)	Broken (Wang et al., 2004)
Preimage resistance	Yes	Theoretical weaknesses only
Speed (software)	~500 MB/s	~600 MB/s
Speed (SHA-NI hardware)	~3+ GB/s	No common acceleration
Status	Secure, recommended	Broken, do not use for security
Common uses	TLS, Bitcoin, code signing, Git	Legacy checksums, non-adversarial integrity

When to choose which

Choose SHA-256 for anything where an adversary might want to forge a collision: digital signatures, certificate fingerprints, file integrity against malicious substitution, commitment schemes, blockchain anchoring, content-addressable storage. SHA-256 is also the right choice for HMAC (HMAC-SHA-256) for message authentication, JWT signing (HS256, or RS256/ES256 which use SHA-256 internally), and any new system that needs a hash function. There is essentially no situation where MD5 is preferable to SHA-256 for security-sensitive code.

Choose MD5 only for non-adversarial integrity checks where MD5 is already the established format and replacing it would be costly. The classic case is verifying a downloaded file against the publisher's published MD5 — if your only concern is detecting a corrupted download (not a malicious substitution), MD5 works fine. Many Linux distribution mirrors still publish MD5 checksums alongside SHA-256 for legacy compatibility. Use MD5 in those cases by all means, but prefer SHA-256 when both are available.

Never use MD5 for password storage. Never. MD5 is too fast (billions of hashes per second on a GPU), unsalted MD5 is trivially reversed via rainbow tables, and even salted MD5 falls to brute force on common passwords in seconds. The correct choice for password storage is a slow, memory-hard key derivation function: bcrypt (1999), scrypt (2009), or Argon2id (the winner of the 2015 Password Hashing Competition, recommended by OWASP). These algorithms are deliberately slow and memory-intensive to make brute-force attacks expensive. SHA-256 alone is also wrong for passwords — too fast — though PBKDF2-HMAC-SHA-256 with a high iteration count is acceptable where Argon2 is unavailable.

Never use MD5 for digital signatures, certificate fingerprints, or any system where an attacker could benefit from producing two inputs with the same hash. The Flame attack demonstrated real-world exploitation of this; do not assume your system is too obscure to be targeted.

A note on SHA-1: it is also broken (SHAttered attack, 2017, produced a real collision) and should be treated like MD5 — fine for non-adversarial checksums, never for security. Browsers stopped trusting SHA-1 certificates in 2017.

Conclusion

MD5 is cryptographically broken and has been for two decades. It survives because it is fast, simple, and good enough for non-adversarial file integrity checks where the threat model is corruption rather than tampering. SHA-256 is the modern standard: secure, widely accelerated in hardware, and the right default for every security-sensitive use. The one place neither belongs is password storage — for that, use Argon2id, bcrypt, or scrypt. If you remember one rule from this article, let it be that. The Hash Generator on this site produces MD5, SHA-1, SHA-256, and SHA-512 side by side so you can see the difference in output size, but please use the right algorithm for the right job.