AWS ML Engineer Associate (MLA-C01) Study Guide & Cheat Sheet

A free study guide for the AWS ML Engineer Associate (MLA-C01) exam — exam facts, the domain breakdown, study tips, a topic cheat sheet, and a full glossary. No sign-up needed.

Ready to practice? Take the free AWS ML Engineer Associate (MLA-C01) practice quiz →  ·  Get the full exam prep →

AWS Certified Machine Learning Engineer - Associate (MLA-C01) Study Guide

Questions65 (50 scored + 15 unscored)
Time limit170 minutes
Price$150 USD
DeliveryPearson VUE or PSI; test center or online proctored
ScoringScaled 100-1000; 720 to pass
Validity3 years
PrerequisitesNone required; ~1 year of ML/SageMaker experience recommended
LanguageEnglish plus other localized languages

Exam domains

DomainWeightWhat it covers
Data Preparation for ML28%Ingest, store, transform, and validate data for ML. Covers S3 storage, Glue/DataBrew and EMR/Spark transforms, Athena queries, streaming ingestion with Kinesis/Firehose/MSK, feature engineering with SageMaker Data Wrangler and Feature Store, labeling with Ground Truth, and governance with Lake Formation, handling issues like class imbalance and data quality.
ML Model Development26%Choose, train, and evaluate models. Includes selecting built-in algorithms or JumpStart models, SageMaker training jobs, Automatic Model Tuning and Autopilot, tracking with Experiments, debugging with Debugger and bias/explainability checks with Clarify, and using metrics (accuracy, precision, recall, F1, AUC, RMSE) while managing overfitting and underfitting.
Deployment and Orchestration of ML Workflows22%Deploy models and automate pipelines. Covers real-time, serverless, and asynchronous endpoints, Batch Transform, multi-model endpoints and auto scaling, plus orchestration with SageMaker Pipelines, Step Functions, EventBridge, and CI/CD via CodePipeline, infrastructure as code with CloudFormation/CDK, and container images in Amazon ECR.
ML Solution Monitoring, Maintenance, and Security24%Monitor, maintain, and secure ML solutions in production. Includes SageMaker Model Monitor for data and model quality drift, observability with CloudWatch and CloudTrail, cost and performance optimization, and security through IAM, KMS, VPC, Secrets Manager, PrivateLink, and Macie for sensitive data.

Who it’s for: ML engineers and data professionals who build, deploy, and operate machine learning solutions on AWS in a hands-on, associate-level role.

Study & test-day tips

  • Map every scenario to the AWS-managed service first: prefer SageMaker built-in tooling (Data Wrangler, Autopilot, Pipelines) over custom code when the question rewards lower operational overhead.
  • Know endpoint trade-offs cold: real-time for low-latency steady traffic, serverless for intermittent/spiky traffic, asynchronous for large payloads or long inference, and Batch Transform for offline scoring of whole datasets.
  • Match the metric to the problem: precision/recall/F1 and AUC for imbalanced classification, accuracy only for balanced classes, and RMSE/MAE for regression. Watch for questions that penalize false positives vs false negatives.
  • For class imbalance, recognize techniques like resampling, SMOTE, class weighting, and stratified splits, and pair them with the right evaluation metric rather than raw accuracy.
  • Use SageMaker Clarify for both pre-training bias detection and post-training feature attribution/explainability; use Debugger for training-time issues like vanishing gradients or overfitting.
  • For drift, remember Model Monitor compares live traffic against a baseline for data quality, model quality, bias drift, and feature attribution drift, emitting CloudWatch metrics that can trigger retraining.
  • Choose ingestion by pattern: Kinesis Data Streams for custom real-time consumers, Data Firehose for near-real-time delivery to S3/Redshift, and MSK for Kafka workloads; use Glue/DataBrew for batch ETL and Athena for serverless SQL on S3.
  • For MLOps automation, use SageMaker Pipelines for ML-native DAGs, Step Functions for broader AWS service orchestration, EventBridge for event triggers, and CodePipeline for CI/CD of model build and deploy.
  • Default to least privilege and encryption: scope IAM roles tightly, encrypt with KMS, isolate with VPC and PrivateLink, store credentials in Secrets Manager, and detect sensitive data with Macie.
  • Save with Feature Store and multi-model endpoints: use the offline store for training and online store for low-latency lookups, and host many models behind one endpoint to cut cost when models are infrequently invoked.

Cheat sheet

Data Preparation and Feature Engineering

  • Amazon S3 as the central data lake; Lake Formation for fine-grained access governance.
  • AWS Glue and Glue DataBrew for ETL and visual data prep; EMR/Spark for large-scale processing.
  • Athena for serverless SQL queries directly on S3 data.
  • SageMaker Data Wrangler for visual feature engineering; Feature Store for online/offline feature reuse.
  • SageMaker Ground Truth for data labeling and human-in-the-loop annotation.
  • Handle class imbalance (resampling, SMOTE, class weights) and split data with stratification.

Streaming and Ingestion

  • Kinesis Data Streams: real-time, custom consumers, replayable shards.
  • Amazon Data Firehose: near-real-time delivery to S3, Redshift, OpenSearch with transforms.
  • Amazon MSK: managed Apache Kafka for high-throughput event streaming.
  • Glue/DataBrew for scheduled batch ingestion and transformation into the lake.

Model Development and Training

  • SageMaker built-in algorithms (XGBoost, Linear Learner, etc.) and JumpStart pretrained models.
  • SageMaker training jobs with managed spot training to reduce cost.
  • Automatic Model Tuning (hyperparameter optimization) and Autopilot for AutoML.
  • SageMaker Experiments to track runs, parameters, and metrics.
  • Debugger for training issues; Clarify for bias detection and explainability.
  • Evaluation metrics: accuracy, precision, recall, F1, AUC for classification; RMSE/MAE for regression.

Deployment and Inference

  • Real-time endpoints for low-latency, steady traffic with auto scaling.
  • Serverless endpoints for intermittent traffic with no idle cost.
  • Asynchronous endpoints for large payloads and long-running inference.
  • Batch Transform for offline scoring of entire datasets.
  • Multi-model endpoints to host many models behind one endpoint and cut cost.

Orchestration and MLOps

  • SageMaker Pipelines for ML-native CI/CD workflows and model registry.
  • Step Functions for orchestrating across AWS services; EventBridge for event triggers.
  • CodePipeline for CI/CD of build, train, and deploy stages.
  • CloudFormation and AWS CDK for infrastructure as code.
  • Amazon ECR to store and version custom training/inference container images.

Monitoring, Maintenance, and Security

  • SageMaker Model Monitor for data quality, model quality, and drift detection against a baseline.
  • CloudWatch for metrics, logs, and alarms; CloudTrail for API audit trails.
  • IAM least-privilege roles and policies for all ML resources.
  • KMS encryption, VPC isolation, and PrivateLink for private connectivity.
  • Secrets Manager for credentials; Macie to discover and protect sensitive data (PII).

Glossary

Amazon S3
Object storage service that serves as the central data lake for ML datasets, models, and artifacts.
AWS Glue
Serverless ETL service for discovering, cataloging, and transforming data at scale.
AWS Glue DataBrew
Visual data preparation tool for cleaning and normalizing data without writing code.
Amazon Athena
Serverless interactive query service that runs SQL directly against data in S3.
Amazon EMR
Managed big data platform running Apache Spark, Hadoop, and related frameworks for large-scale processing.
Amazon Kinesis Data Streams
Real-time streaming data service with replayable shards for custom consumer applications.
Amazon Data Firehose
Near-real-time streaming delivery service that loads data into S3, Redshift, and other destinations.
Amazon MSK
Managed Streaming for Apache Kafka, providing fully managed Kafka clusters for event streaming.
SageMaker Data Wrangler
Visual tool for importing, exploring, and engineering features from data for ML.
SageMaker Feature Store
Repository for storing, sharing, and serving ML features with online and offline stores.
SageMaker Ground Truth
Data labeling service with human and automated workflows to create training datasets.
AWS Lake Formation
Service for building and governing data lakes with fine-grained access control.
SageMaker training job
Managed compute job that trains a model on provided data and outputs model artifacts to S3.
Built-in algorithms
Prebuilt, optimized SageMaker algorithms such as XGBoost and Linear Learner for common ML tasks.
Automatic Model Tuning
SageMaker hyperparameter optimization that searches parameter combinations to maximize an objective metric.
SageMaker Autopilot
AutoML capability that automatically builds, trains, and tunes candidate models with transparency.
SageMaker JumpStart
Hub of pretrained models and solution templates for fast fine-tuning and deployment.
SageMaker Experiments
Capability to organize, track, and compare training runs, parameters, and metrics.
SageMaker Debugger
Tool that captures training tensors to detect issues like overfitting and vanishing gradients.
SageMaker Clarify
Service that detects bias in data and models and explains predictions via feature attribution.
Real-time endpoint
Persistent SageMaker endpoint serving low-latency, synchronous predictions with auto scaling.
Serverless endpoint
SageMaker inference option that scales to zero, ideal for intermittent or spiky traffic.
Asynchronous endpoint
SageMaker endpoint that queues requests for large payloads and long-running inference.
Batch Transform
SageMaker feature for offline, high-throughput inference over an entire dataset.
Multi-model endpoint
Single SageMaker endpoint that dynamically loads and hosts many models to reduce cost.
SageMaker Pipelines
Purpose-built CI/CD service for building, automating, and managing ML workflows.
AWS Step Functions
Serverless orchestration service that coordinates multiple AWS services into workflows.
Amazon EventBridge
Event bus service that triggers ML workflows and actions based on events.
Amazon ECR
Elastic Container Registry for storing and versioning Docker images for training and inference.
SageMaker Model Monitor
Service that monitors deployed models for data quality, model quality, and drift against a baseline.
Model drift
Degradation in model performance over time as live data diverges from training data distribution.
Amazon Macie
Security service that uses ML to discover and protect sensitive data such as PII in S3.
AWS PrivateLink
Service that provides private connectivity to AWS services without exposing traffic to the public internet.
AWS KMS
Key Management Service for creating and controlling encryption keys protecting data at rest and in transit.

HOW TO // AI is not affiliated with or endorsed by Amazon Web Services. AWS Certified Machine Learning Engineer – Associate and MLA-C01 are certifications of Amazon.com, Inc. or its affiliates; we reference them descriptively. All questions are original.

Practice for every AI & cloud cert

Scroll to Top