Image Quality Metrics SSIM/PSNR/Butteraugli — Practical Guide 2025

Published: Sep 19, 2025 · Reading time: 3 min · By Unified Image Tools Editorial

Image Quality Metrics SSIM/PSNR/Butteraugli — Practical Guide 2025

For validating degradation from compression or resizing, objective metrics like SSIM, PSNR, and Butteraugli are powerful. Used blindly, they can mislead. This guide explains each metric and shows how to integrate them reliably.

Why objective metrics (and their limits)

Human judgments vary (fatigue, lighting, bias), and modern pipelines process many images — visual review only doesn’t scale.

Limits to remember:

Context ignorance: metrics do not know the use-case (icon vs photo)
Perception gap: numerically better can still look worse
Local artifacts: global averages can hide local damage (text edges)

Tip: Combine multiple metrics and keep spot checks on real screens.

Metrics in detail

PSNR (Peak Signal-to-Noise Ratio)

PSNR measures global pixel error relative to full scale.

import cv2, numpy as np

def calculate_psnr(original, compressed):
    mse = np.mean((original - compressed) ** 2)
    if mse == 0:
        return float('inf')
    return 20 * np.log10(255.0 / np.sqrt(mse))

Interpretation:

< 30 dB: visible degradation
30–35 dB: acceptable, review carefully
35–40 dB: good
40 dB: excellent

Best for logos/line art and sharp edges. Misses local artifacts.

SSIM (Structural Similarity)

SSIM compares luminance, contrast, and structure, closer to vision.

from skimage.metrics import structural_similarity as ssim
import cv2, numpy as np

def calculate_ssim(original, compressed):
    g1 = cv2.cvtColor(original, cv2.COLOR_BGR2GRAY)
    g2 = cv2.cvtColor(compressed, cv2.COLOR_BGR2GRAY)
    val, diff = ssim(g1, g2, full=True)
    diff = (diff * 255).astype(np.uint8)
    return val, diff

Interpretation:

≥ 0.95: nearly indistinguishable
0.90–0.95: high quality
0.80–0.90: acceptable depending on use
< 0.80: noticeable degradation

Good for photos/natural images; sensitive to structural changes.

Butteraugli (Google)

Perceptual distance tuned to human vision, strong on color errors.

# Build and run (Linux)
git clone https://github.com/google/butteraugli.git && cd butteraugli && make
./butteraugli original.jpg compressed.jpg

Interpretation (lower is better):

< 1.0: excellent
1.0–1.5: good
1.5–3.0: acceptable
3.0: visible issues

Make comparisons fair

1) Unify color space

Convert both images to the same color space first (usually sRGB). See Color Management and ICC — sRGB/Display-P3/CMYK Handoff 2025.

2) Match resolution/aspect

import cv2

def standardize_resolution(original, compressed, target_size=None):
    if target_size is None:
        h1, w1 = original.shape[:2]; h2, w2 = compressed.shape[:2]
        target_size = (min(w1, w2), min(h1, h2))
    return (
        cv2.resize(original, target_size, cv2.INTER_LANCZOS4),
        cv2.resize(compressed, target_size, cv2.INTER_LANCZOS4),
    )

3) Normalize bit depth

Ensure 8-bit vs 16-bit mismatches are resolved before measuring.

Batch assessment workflow

import os, csv, cv2
from pathlib import Path

def batch_quality_assessment(original_dir, compressed_dir, output_csv):
    results = []
    for original_path in Path(original_dir).glob('*.jpg'):
        comp = Path(compressed_dir) / original_path.name
        if not comp.exists():
            continue
        o = cv2.imread(str(original_path)); c = cv2.imread(str(comp))
        o, c = standardize_resolution(o, c)
        psnr_val = calculate_psnr(o, c)
        ssim_val, _ = calculate_ssim(o, c)
        results.append({ 'filename': original_path.name, 'psnr': psnr_val, 'ssim': ssim_val })
    with open(output_csv, 'w', newline='') as f:
        w = csv.DictWriter(f, fieldnames=['filename','psnr','ssim'])
        w.writeheader(); w.writerows(results)
    return results

CI to catch regressions

name: Image Quality Regression Test
on: { pull_request: { paths: ['assets/images/**'] } }
jobs:
  quality-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v3
        with: { python-version: '3.11' }
      - run: pip install opencv-python scikit-image numpy
      - run: python scripts/quality_check.py --threshold-ssim 0.90 --threshold-psnr 30.0

Per content type

Photos: SSIM primary, Butteraugli secondary; aim SSIM > 0.90, Butteraugli < 1.5
Logos/icons: PSNR primary, SSIM secondary; aim PSNR > 35 dB, SSIM > 0.95
Charts: PSNR primary, local SSIM around axis/labels

Common pitfalls

Comparing different formats unfairly — normalize first (e.g., re-save as PNG) before measuring.
Single global metric hides local damage. Compute regional SSIM over tiles when text/edges matter.

Advanced: perceptual weighting and time chains

Weight SSIM by saliency maps; track quality across multi-step chains (save → resize → save) and fail when thresholds dip.

Summary

Use multiple metrics, normalize input (colorspace, resolution, depth), and integrate CI to catch regressions. Keep human checks for critical visuals.

Share on X Back to list