Observability

Monitoring with Model Monitor, CloudWatch, CloudTrail

Build a comprehensive observability stack for ML systems.

What this covers

The monitoring stack described here correlates statistical drift, infrastructure health, and audit trails into a single operating picture.

Implementation trail

Set up Model Monitor baselines

Collect endpoint latency, error, and resource metrics in CloudWatch and tag them with model version IDs.
Aggregate metrics by business capability to align with SLA reporting.
Use anomaly detection on CloudWatch metrics to surface subtle degradations.

Enable CloudTrail data events for model endpoints, pipelines, and feature store operations.
Stream CloudTrail logs to Lake Formation-governed S3 buckets for long-term retention.
Correlate CloudTrail events with monitoring alerts to accelerate investigations.

We align Model Monitor, CloudWatch, and CloudTrail so platform, product, and compliance teams operate from the same dataset.