Start each pipeline with data visibility tasks
- Run a SageMaker Processing job that profiles the latest data snapshot, capturing distribution metrics and data volume deltas.
- Store profiling outputs in S3 with a Glue table so analysts can query historical data quality trends.
- Push summary statistics to CloudWatch Metrics to drive alarms when new data deviates materially from prior weeks.
