Image Quality Metrics SSIM/PSNR/Butteraugli — Practical Guide 2025
Published: Sep 19, 2025 · Reading time: 3 min · By Unified Image Tools Editorial
Image Quality Metrics SSIM/PSNR/Butteraugli — Practical Guide 2025
For validating degradation from compression or resizing, objective metrics like SSIM, PSNR, and Butteraugli are powerful. Used blindly, they can mislead. This guide explains each metric and shows how to integrate them reliably.
Why objective metrics (and their limits)
Human judgments vary (fatigue, lighting, bias), and modern pipelines process many images — visual review only doesn’t scale.
Limits to remember:
- Context ignorance: metrics do not know the use-case (icon vs photo)
- Perception gap: numerically better can still look worse
- Local artifacts: global averages can hide local damage (text edges)
Tip: Combine multiple metrics and keep spot checks on real screens.
Metrics in detail
PSNR (Peak Signal-to-Noise Ratio)
PSNR measures global pixel error relative to full scale.
import cv2, numpy as np
def calculate_psnr(original, compressed):
mse = np.mean((original - compressed) ** 2)
if mse == 0:
return float('inf')
return 20 * np.log10(255.0 / np.sqrt(mse))
Interpretation:
- < 30 dB: visible degradation
- 30–35 dB: acceptable, review carefully
- 35–40 dB: good
-
40 dB: excellent
Best for logos/line art and sharp edges. Misses local artifacts.
SSIM (Structural Similarity)
SSIM compares luminance, contrast, and structure, closer to vision.
from skimage.metrics import structural_similarity as ssim
import cv2, numpy as np
def calculate_ssim(original, compressed):
g1 = cv2.cvtColor(original, cv2.COLOR_BGR2GRAY)
g2 = cv2.cvtColor(compressed, cv2.COLOR_BGR2GRAY)
val, diff = ssim(g1, g2, full=True)
diff = (diff * 255).astype(np.uint8)
return val, diff
Interpretation:
- ≥ 0.95: nearly indistinguishable
- 0.90–0.95: high quality
- 0.80–0.90: acceptable depending on use
- < 0.80: noticeable degradation
Good for photos/natural images; sensitive to structural changes.
Butteraugli (Google)
Perceptual distance tuned to human vision, strong on color errors.
# Build and run (Linux)
git clone https://github.com/google/butteraugli.git && cd butteraugli && make
./butteraugli original.jpg compressed.jpg
Interpretation (lower is better):
- < 1.0: excellent
- 1.0–1.5: good
- 1.5–3.0: acceptable
-
3.0: visible issues
Make comparisons fair
1) Unify color space
Convert both images to the same color space first (usually sRGB). See Color Management and ICC — sRGB/Display-P3/CMYK Handoff 2025.
2) Match resolution/aspect
import cv2
def standardize_resolution(original, compressed, target_size=None):
if target_size is None:
h1, w1 = original.shape[:2]; h2, w2 = compressed.shape[:2]
target_size = (min(w1, w2), min(h1, h2))
return (
cv2.resize(original, target_size, cv2.INTER_LANCZOS4),
cv2.resize(compressed, target_size, cv2.INTER_LANCZOS4),
)
3) Normalize bit depth
Ensure 8-bit vs 16-bit mismatches are resolved before measuring.
Batch assessment workflow
import os, csv, cv2
from pathlib import Path
def batch_quality_assessment(original_dir, compressed_dir, output_csv):
results = []
for original_path in Path(original_dir).glob('*.jpg'):
comp = Path(compressed_dir) / original_path.name
if not comp.exists():
continue
o = cv2.imread(str(original_path)); c = cv2.imread(str(comp))
o, c = standardize_resolution(o, c)
psnr_val = calculate_psnr(o, c)
ssim_val, _ = calculate_ssim(o, c)
results.append({ 'filename': original_path.name, 'psnr': psnr_val, 'ssim': ssim_val })
with open(output_csv, 'w', newline='') as f:
w = csv.DictWriter(f, fieldnames=['filename','psnr','ssim'])
w.writeheader(); w.writerows(results)
return results
CI to catch regressions
name: Image Quality Regression Test
on: { pull_request: { paths: ['assets/images/**'] } }
jobs:
quality-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v3
with: { python-version: '3.11' }
- run: pip install opencv-python scikit-image numpy
- run: python scripts/quality_check.py --threshold-ssim 0.90 --threshold-psnr 30.0
Per content type
- Photos: SSIM primary, Butteraugli secondary; aim SSIM > 0.90, Butteraugli < 1.5
- Logos/icons: PSNR primary, SSIM secondary; aim PSNR > 35 dB, SSIM > 0.95
- Charts: PSNR primary, local SSIM around axis/labels
Common pitfalls
- Comparing different formats unfairly — normalize first (e.g., re-save as PNG) before measuring.
- Single global metric hides local damage. Compute regional SSIM over tiles when text/edges matter.
Advanced: perceptual weighting and time chains
Weight SSIM by saliency maps; track quality across multi-step chains (save → resize → save) and fail when thresholds dip.
Summary
Use multiple metrics, normalize input (colorspace, resolution, depth), and integrate CI to catch regressions. Keep human checks for critical visuals.