Skip to content

Optimize

The goal is to measure the served artifact against defined performance targets and improve the pipeline only when those targets are not met. Typical KPIs include p50/p95 latency, throughput, failure rate, startup time, memory usage, and cost.

For example, use benchmarking in the ML artifact project to compare model runtimes such as PyTorch, ONNX, or quantized variants, and benchmarking in the MLOps project to measure service-level performance such as p50/p95 latency, failures, and requests per second.