Back to Playbooks
Observability

Monitoring with Model Monitor, CloudWatch, CloudTrail

Build a comprehensive observability stack for ML systems.

What this covers

The monitoring stack described here correlates statistical drift, infrastructure health, and audit trails into a single operating picture.

Implementation trail

  • Data drift detection
  • Operational metrics
  • Audit logging
  • Alert routing
  • Dashboarding

Set up Model Monitor baselines

  • Profile training data to create baseline statistics and constraints.
  • Schedule monitoring jobs per endpoint with appropriate sampling strategies.
  • Store reports in S3 and register references in Model Registry entries.

Instrument infrastructure metrics

  • Collect endpoint latency, error, and resource metrics in CloudWatch and tag them with model version IDs.
  • Aggregate metrics by business capability to align with SLA reporting.
  • Use anomaly detection on CloudWatch metrics to surface subtle degradations.

Capture governance signals

  • Enable CloudTrail data events for model endpoints, pipelines, and feature store operations.
  • Stream CloudTrail logs to Lake Formation-governed S3 buckets for long-term retention.
  • Correlate CloudTrail events with monitoring alerts to accelerate investigations.

Create a single pane of glass

We align Model Monitor, CloudWatch, and CloudTrail so platform, product, and compliance teams operate from the same dataset.

Unify your observability