Website Performance Optimization: A Practical Guide

Core Web Vitals: what Google actually measures

Google ranks pages partly on performance, and the metrics it uses are the Core Web Vitals. As of 2024, the three metrics are Largest Contentful Paint (LCP), Interaction to Next Paint (INP), and Cumulative Layout Shift (CLS). Each has a specific definition, a specific target, and a specific set of fixes.

LCP measures when the largest visible element finishes rendering. For most pages, that element is a hero image, a hero video poster, or a large block of text. The target is under 2.5 seconds for at least 75 percent of page loads. LCP captures perceived loading speed — the moment the user feels the page is there.

INP replaced First Input Delay (FID) in March 2024. It measures the time from a user interaction (a click, a tap, a key press) to the next frame the browser paints in response. The target is under 200 milliseconds. INP captures interactivity — how quickly the page responds to the user. The old FID only measured the first interaction; INP samples all interactions throughout the page's lifetime and reports the worst (well, the 98th percentile).

CLS measures visual stability — how much the visible content shifts around as the page loads. Every time an element moves because a font loaded, an image finally rendered, or a banner slid in from the top, the layout shifts. The target is under 0.1, a unitless score computed from the fraction of the viewport affected multiplied by the distance moved. CLS captures the frustration of trying to click something that moved out from under your cursor.

These three metrics are field metrics — they are measured on real users in the wild, via the Chrome User Experience Report and any site that ships a Real User Monitoring (RUM) library. Lab tools approximate them, but the numbers that count for ranking are the field numbers.

A useful mental model: LCP is about getting the most important content visible quickly, INP is about keeping the main thread free to respond, and CLS is about not surprising the user with movement. Each maps to a different category of fix — image and asset optimization for LCP, JavaScript budgeting for INP, reserved space and stable layout for CLS.

The targets at a glance:

LCP under 2.5 seconds for 75 percent of page loads.
INP under 200 milliseconds.
CLS under 0.1.

Hitting all three at Level AA is the threshold Google uses for ranking signals. Hitting them is not the same as keeping them — a single heavy third-party script added in a Friday deploy can drop you out of compliance for a month.

Measurement: Lighthouse, RUM, and the field vs lab gap

You cannot improve what you do not measure. Two categories of tools matter: lab tools, which run a controlled test on a fixed device and network, and field tools, which collect data from real users.

Lighthouse is the dominant lab tool. It runs in Chrome DevTools, in CI via PageSpeed Insights, and in hosted services like WebPageTest. Lighthouse simulates a page load on a throttled mobile device (typically a Moto G Power with a slow 4G connection) and reports scores for performance, accessibility, SEO, and best practices. The performance score is a weighted combination of LCP, INP, CLS, and a few other metrics. Lighthouse is excellent for catching regressions during development and for diagnosing specific issues — it tells you which images are too large, which scripts block the main thread, which fonts cause layout shifts.

PageSpeed Insights combines Lighthouse lab data with the Chrome User Experience Report field data for the URL. The field data is what Google uses for ranking; the lab data is what you use to fix problems. A page can score 95 on Lighthouse and still have poor field metrics if real users are on worse devices or networks than the simulation.

The field-lab gap is real and frequently misunderstood. Lab tools run on fast developer machines over simulated slow networks; field data runs on whatever your users actually have. A page that loads in 1.5 seconds on a developer's M2 MacBook over office wifi may take 8 seconds on a five-year-old Android phone on a congested cell network. Field data captures that gap; lab data does not. To close it, ship a RUM library — `web-vitals` from Google is the standard — and collect LCP, INP, and CLS from real sessions, then aggregate them by user geography, device class, and connection type.

A practical workflow: run Lighthouse locally on every pull request that touches the rendering path, ship the `web-vitals` library in production, and review the field numbers weekly. Watch for regressions by geography (a CDN edge that has degraded), by device class (a JavaScript change that disproportionately affects low-end phones), and by browser (a feature that performs worse in Safari than in Chrome).

Image optimization: usually half your bytes

Images are the largest payload on most web pages, and they are the most common cause of poor LCP. The good news is that image optimization is well-understood and the tools are mature.

Format first. Use WebP or AVIF instead of JPEG and PNG wherever possible. AVIF, supported in all major browsers since 2022, produces files roughly half the size of JPEG at equivalent visual quality. WebP is slightly larger than AVIF but more widely compatible and better supported in older browsers. The HTML `<picture>` element lets you serve AVIF to browsers that support it and fall back to WebP, then to JPEG.

Always include `width` and `height` attributes on `<img>` tags. The browser uses them to reserve space for the image before it loads, which prevents layout shift (CLS). Without them, the image loads with zero height, then snaps to full size when complete, shifting everything below it.

Use `srcset` and `sizes` to serve appropriately sized images to different devices. A phone does not need to download the 4000-pixel-wide image you serve to a 4K monitor. `srcset` lets you list multiple resolutions, and the browser picks the right one based on the device's pixel density and the rendered size. The `sizes` attribute tells the browser how wide the image will be at different breakpoints, which lets it pick the right resolution before it has parsed your CSS.

Lazy-load images that are not initially visible with `loading="lazy"`. The browser will defer loading them until they are close to the viewport, which saves bandwidth and reduces initial load. Do not lazy-load the LCP image — that defeats the purpose. Use `fetchpriority="high"` on the LCP image to tell the browser to start loading it early, before the parser would otherwise reach it.

A note on responsive images with art direction: if you need different crops for different screen sizes (a wide hero on desktop, a tighter crop on mobile), use the `<picture>` element with multiple `<source media="(min-width: 768px)">` elements. For simple scaling where the same crop works at all sizes, `srcset` alone is enough.

JavaScript: code splitting, tree shaking, and the cost of a kilobyte

JavaScript is the most expensive byte you can ship. The browser has to download it, parse it, compile it, and execute it — and on a mid-range phone, each of those steps is slower than you think. A 200-kilobyte JavaScript bundle can take a full second to parse and execute on a Moto G, even after the download is complete. The same 200 kilobytes of image is decoded in tens of milliseconds.

The first lever is shipping less. Tree shaking removes code you import but do not use, and it works when your bundler can statically analyze your imports. Use ES module imports (`import { foo } from 'lib'` rather than `const lib = require('lib')`), and avoid patterns that defeat static analysis. Side-effect-free packages marked with `"sideEffects": false` in their `package.json` can be tree-shaken aggressively.

Code splitting breaks your bundle into smaller chunks that load on demand. The most common pattern is route-based splitting: each route in a single-page app is its own chunk, loaded when the user navigates to it. React's `lazy()` and `Suspense`, Vue's `defineAsyncComponent`, and direct dynamic `import()` calls all produce code-split chunks. The win is that the initial load only contains the code for the initial route.

The second lever is shipping smarter. Defer non-critical JavaScript with `defer` (on classic scripts) or by using ES modules (which defer by default). Avoid `async` on critical-path scripts — it lets the script execute as soon as it loads, which can interrupt parsing. Move third-party scripts (analytics, chat widgets, A/B testing) off the critical path; many of them can be loaded after the page is interactive, or via a partytown-style web worker.

Audit your dependencies. Every npm package you install adds to your bundle. `bundlephobia.com` estimates the size of any package. Look for unnecessarily large dependencies, duplicate versions of the same package, and packages that pull in polyfills you do not need. A common offender is `moment.js`, which ships every locale by default and adds 67 kilobytes to your bundle; `date-fns` or the native `Intl` API does the same job at a fraction of the size.

A specific note on INP: the metric measures main thread responsiveness, which means a long task blocking the main thread on user input is the failure mode. Profile your click and keypress handlers; if any of them does more than 50 milliseconds of work synchronously, break it up. Often the fix is to defer non-critical work with `requestIdleCallback` or `scheduler.postTask()`, or to move it to a Web Worker.

Caching: browser, CDN, service worker

Caching is the cheapest performance win available. A resource served from the browser cache costs zero network bytes and zero server time. The challenge is using cache headers correctly so that users get cached responses when possible and fresh responses when needed.

The `Cache-Control` header is the primary tool. For static assets that change only when the file content changes — JavaScript bundles, CSS files, images — use `Cache-Control: public, max-age=31536000, immutable`. The `immutable` flag tells the browser the resource will never change, so it should not revalidate it. This works only if you fingerprint your files (include a hash in the filename, like `app.a3f9c1.js`); when the content changes, the filename changes, and the browser fetches the new file. Without fingerprinting, long cache lifetimes prevent users from seeing updates.

For HTML, where you cannot fingerprint the URL, use `Cache-Control: no-cache` (which still allows the browser to cache but forces revalidation on each request) or `Cache-Control: max-age=0, must-revalidate`. The CDN edge caches the HTML briefly — typically a minute or two — to absorb traffic spikes, then revalidates with the origin.

A Content Delivery Network (CDN) caches your assets at edge locations close to your users. Cloudflare, Fastly, AWS CloudFront, and the edge networks built into Vercel and Cloudflare Pages all do this. The benefit is latency: a request from London to a server in Virginia takes 80 milliseconds round-trip; a request from London to a Cloudflare edge in London takes 2. Cache-Control headers control how the CDN caches your content; without correct headers, the CDN passes every request through to your origin.

Service workers let you cache resources programmatically in the browser, beyond what HTTP caching provides. They are how Progressive Web Apps work offline. A common pattern is a cache-first strategy for static assets: try the cache, fall back to the network, and cache the response for next time. For dynamic API responses, network-first or stale-while-revalidate strategies balance freshness with speed. Use Workbox, Google's service worker library, rather than writing the caching logic by hand — the edge cases are subtle and the library handles them.

One caching mistake that causes real bugs: caching too aggressively at the CDN for endpoints that return user-specific data. A `/api/me` response cached at the edge leaks one user's data to the next. The fix is `Cache-Control: private` for any response that varies by user, which tells the CDN not to cache it but still allows the user's browser to cache it.

Fonts: the invisible performance tax

Web fonts are a frequent cause of poor performance, and they are often invisible to developers because the fonts just work on a fast connection. On a slow connection, a font that takes three seconds to load can cause either a flash of unstyled text (FOUT), a flash of invisible text (FOIT), or a layout shift when the font finally loads.

The `font-display` CSS property controls this. The four values are `auto` (browser default, usually `block`), `block` (invisible text up to three seconds, then fallback), `swap` (fallback immediately, swap when the font loads), `fallback` (very short block, then fallback, with a brief swap window), and `optional` (browser decides based on network speed; may not load the font at all on slow connections). For most body text, `font-display: swap` is the right choice — users see text immediately, and the font swaps in when it arrives. For a logo where the font is the brand, `block` may be acceptable.

Preload your primary font with `<link rel="preload" as="font" type="font/woff2" href="/fonts/inter.woff2" crossorigin>`. The `crossorigin` attribute is mandatory even for same-origin fonts, because fonts are fetched in anonymous mode by default. Preloading tells the browser to start fetching the font early, before the CSS parser would otherwise discover it.

Subset your fonts. Most fonts include glyphs for hundreds of scripts you do not use. A Latin-only subset of Inter is roughly 30 kilobytes; the full font with all scripts is 300. Use `unicode-range` in your CSS to load additional subsets only when characters from those ranges appear on the page. Or use a service like Google Fonts that handles subsetting automatically — though be aware that Google Fonts adds a render-blocking CSS request and a redirect; self-hosting is faster if you can manage it.

Variable fonts can reduce the number of font files you ship. A single variable font file contains all weights and styles, which is more efficient than shipping separate files for regular, bold, italic, and bold-italic. The trade-off is that the variable font file is larger than any single static weight, so it only wins if you actually use multiple weights on the page.

Font loading also affects CLS. When a fallback font is replaced by the web font, the metrics usually change, which shifts the surrounding text. Use `size-adjust`, `ascent-override`, and `descent-override` in your `@font-face` declaration to make the fallback font's metrics match the web font's, so the swap does not move text. This is a small CSS detail with a large CLS impact.

HTTP/2, HTTP/3, and the transport layer

The transport layer is the last piece, and for most teams, it is the one that requires the least work — modern hosting platforms handle it. But understanding what is happening under the hood helps you make decisions about how you bundle and ship assets.

HTTP/1.1, the protocol that dominated the web for two decades, has a fundamental limitation: browsers can open only six concurrent connections per origin. This is why the 2010s best practice was to concatenate assets into single files — one big JavaScript bundle, one big CSS file — to fit within the connection limit.

HTTP/2, deployed widely since 2016, multiplexes multiple requests over a single connection. The six-connection limit is gone, and with it the rationale for concatenation. Code splitting becomes free — you can ship dozens of small chunks with no performance penalty, and the browser fetches them all in parallel over one connection.

HTTP/3, increasingly available in 2025, runs over QUIC, which is built on UDP rather than TCP. The practical benefit is faster connection establishment (one round-trip instead of two or three) and better performance on lossy networks (a dropped packet does not block other streams). The deployment story is still uneven — HTTP/3 requires the server, the CDN, and the client to all support it — but most major CDNs now offer it.

The actionable advice is to make sure your hosting platform supports HTTP/2 at minimum (most do, by default, in 2025) and to stop concatenating assets for performance reasons. Code splitting is now strictly better than bundling for cache efficiency: a user who has visited your site before only re-downloads the chunks that have changed, not the entire bundle. The old bundle-everything reflex is obsolete; unlearn it.

Performance is a continuous discipline. Measure, fix the biggest issue, measure again. The Core Web Vitals targets are achievable for almost any site that takes them seriously, and the ranking benefit of meeting them is real. Start with images and JavaScript, which together account for the majority of regressions, and work outward from there. The work is never finished, but the curve bends toward fast.