Amazon Web Services MLA-C01 AWS Certified Machine Learning Engineer - Associate Exam Practice Test

Demo: 62 questions
Total 230 questions

Get MLA-C01 Full Access Download MLA-C01 PDF

AWS Certified Machine Learning Engineer - Associate Questions and Answers

Question 1

A company wants to host an ML model on Amazon SageMaker. An ML engineer is configuring a continuous integration and continuous delivery (Cl/CD) pipeline in AWS CodePipeline to deploy the model. The pipeline must run automatically when new training data for the model is uploaded to an Amazon S3 bucket.

Select and order the pipeline's correct steps from the following list. Each step should be selected one time or not at all. (Select and order three.)

• An S3 event notification invokes the pipeline when new data is uploaded.

• S3 Lifecycle rule invokes the pipeline when new data is uploaded.

• SageMaker retrains the model by using the data in the S3 bucket.

• The pipeline deploys the model to a SageMaker endpoint.

• The pipeline deploys the model to SageMaker Model Registry.

Options:

Question 2

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

Before the ML engineer trains the model, the ML engineer must resolve the issue of the imbalanced data.

Which solution will meet this requirement with the LEAST operational effort?

Options:

Use Amazon Athena to identify patterns that contribute to the imbalance. Adjust the dataset accordingly.

Use Amazon SageMaker Studio Classic built-in algorithms to process the imbalanced dataset.

Use AWS Glue DataBrew built-in features to oversample the minority class.

Use the Amazon SageMaker Data Wrangler balance data operation to oversample the minority class.

Question 3

An ML engineer is using Amazon SageMaker Canvas to build a custom ML model from an imported dataset. The model must make continuous numeric predictions based on 10 years of data.

Which metric should the ML engineer use to evaluate the model’s performance?

Options:

Accuracy

InferenceLatency

Area Under the ROC Curve (AUC)

Root Mean Square Error (RMSE)

Question 4

A company is developing an ML model by using Amazon SageMaker AI. The company must monitor bias in the model and display the results on a dashboard. An ML engineer creates a bias monitoring job.

How should the ML engineer capture bias metrics to display on the dashboard?

Options:

Capture AWS CloudTrail metrics from SageMaker Clarify.

Capture Amazon CloudWatch metrics from SageMaker Clarify.

Capture SageMaker Model Monitor metrics from Amazon EventBridge.

Capture SageMaker Model Monitor metrics from Amazon SNS.

Question 5

A company is building an Amazon SageMaker AI pipeline for an ML model. The pipeline uses distributed processing and training.

An ML engineer needs to encrypt network communication between instances that run distributed jobs. The ML engineer configures the distributed jobs to run in a private VPC.

What should the ML engineer do to meet the encryption requirement?

Options:

Enable network isolation.

Configure traffic encryption by using security groups.

Enable inter-container traffic encryption.

Enable VPC flow logs.

Question 6

A company stores historical data in .csv files in Amazon S3. Only some of the rows and columns in the .csv files are populated. The columns are not labeled. An ML

engineer needs to prepare and store the data so that the company can use the data to train ML models.

Select and order the correct steps from the following list to perform this task. Each step should be selected one time or not at all. (Select and order three.)

• Create an Amazon SageMaker batch transform job for data cleaning and feature engineering.

• Store the resulting data back in Amazon S3.

• Use Amazon Athena to infer the schemas and available columns.

• Use AWS Glue crawlers to infer the schemas and available columns.

• Use AWS Glue DataBrew for data cleaning and feature engineering.

Options:

Answer:

Explanation:

Step 1: Use AWS Glue crawlers to infer the schemas and available columns.

Step 2: Use AWS Glue DataBrew for data cleaning and feature engineering.

Step 3: Store the resulting data back in Amazon S3.

Step 1: Use AWS Glue Crawlers to Infer Schemas and Available Columns

Why? The data is stored in .csv files with unlabeled columns, and Glue Crawlers can scan the raw data in Amazon S3 to automatically infer the schema, including available columns, data types, and any missing or incomplete entries.

How? Configure AWS Glue Crawlers to point to the S3 bucket containing the .csv files, and run the crawler to extract metadata. The crawler creates a schema in the AWS Glue Data Catalog, which can then be used for subsequent transformations.

Step 2: Use AWS Glue DataBrew for Data Cleaning and Feature Engineering

Why? Glue DataBrew is a visual data preparation tool that allows for comprehensive cleaning and transformation of data. It supports imputation of missing values, renaming columns, feature engineering, and more without requiring extensive coding.

How? Use Glue DataBrew to connect to the inferred schema from Step 1 and perform data cleaning and feature engineering tasks like filling in missing rows/columns, renaming unlabeled columns, and creating derived features.

Step 3: Store the Resulting Data Back in Amazon S3

Why? After cleaning and preparing the data, it needs to be saved back to Amazon S3 so that it can be used for training machine learning models.

How? Configure Glue DataBrew to export the cleaned data to a specific S3 bucket location. This ensures the processed data is readily accessible for ML workflows.

Order Summary:

Use AWS Glue crawlers to infer schemas and available columns.

Use AWS Glue DataBrew for data cleaning and feature engineering.

Store the resulting data back in Amazon S3.

This workflow ensures that the data is prepared efficiently for ML model training while leveraging AWS services for automation and scalability.

Question 7

A company plans to use Amazon SageMaker AI to build image classification models. The company has 6 TB of training data stored on Amazon FSx for NetApp ONTAP. The file system is in the same VPC as SageMaker AI.

An ML engineer must make the training data accessible to SageMaker AI training jobs.

Which solution will meet these requirements?

Options:

Mount the FSx for ONTAP file system as a volume to the SageMaker AI instance.

Create an Amazon S3 bucket and use Mountpoint for Amazon S3 to link the bucket to FSx for ONTAP.

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Question 8

An ML engineer needs to use Amazon SageMaker Feature Store to create and manage features to train a model.

Select and order the steps from the following list to create and use the features in Feature Store. Each step should be selected one time. (Select and order three.)

• Access the store to build datasets for training.

• Create a feature group.

• Ingest the records.

Options:

Question 9

A company is using Amazon SageMaker to create ML models. The company's data scientists need fine-grained control of the ML workflows that they orchestrate. The data scientists also need the ability to visualize SageMaker jobs and workflows as a directed acyclic graph (DAG). The data scientists must keep a running history of model discovery experiments and must establish model governance for auditing and compliance verifications.

Which solution will meet these requirements?

Options:

Use AWS CodePipeline and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications.

Use AWS CodePipeline and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications.

Use SageMaker Pipelines and its integration with SageMaker Studio to manage the entire ML workflows. Use SageMaker ML Lineage Tracking for the running history of experiments and for auditing and compliance verifications.

Use SageMaker Pipelines and its integration with SageMaker Experiments to manage the entire ML workflows. Use SageMaker Experiments for the running history of experiments and for auditing and compliance verifications.

Question 10

A company ingests sales transaction data using Amazon Data Firehose into Amazon OpenSearch Service. The Firehose buffer interval is set to 60 seconds.

The company needs sub-second latency for a real-time OpenSearch dashboard.

Which architectural change will meet this requirement?

Options:

Use zero buffering in the Firehose stream and tune the PutRecordBatch batch size.

Replace Firehose with AWS DataSync and enhanced fan-out consumers.

Increase the Firehose buffer interval to 120 seconds.

Replace Firehose with Amazon SQS.

Question 11

A company wants to use Amazon SageMaker AI to host an ML model that runs on CPU for real-time predictions. The model has intermittent traffic during business hours and periods of no traffic after business hours.

Which hosting option will serve inference requests in the MOST cost-effective manner?

Options:

Deploy the model to a real-time endpoint with scheduled auto scaling.

Deploy the model to a SageMaker AI Serverless Inference endpoint with provisioned concurrency during business hours.

Deploy the model to an asynchronous inference endpoint with auto scaling to zero.

Deploy the model to a real-time endpoint and activate it only during business hours using AWS Lambda.

Question 12

A company must install a custom script on any newly created Amazon SageMaker AI notebook instances.

Which solution will meet this requirement with the LEAST operational overhead?

Options:

Create a lifecycle configuration script to install the custom script when a new SageMaker AI notebook is created. Attach the lifecycle configuration to every new SageMaker AI notebook as part of the creation steps.

Create a custom Amazon Elastic Container Registry (Amazon ECR) image that contains the custom script. Push the ECR image to a Docker registry. Attach the Docker image to a SageMaker Studio domain. Select the kernel to run as part of the SageMaker AI notebook.

Create a custom package index repository. Use AWS CodeArtifact to manage the installation of the custom script. Set up AWS PrivateLink endpoints to connect CodeArtifact to the SageMaker AI instance. Install the script.

Store the custom script in Amazon S3. Create an AWS Lambda function to install the custom script on new SageMaker AI notebooks. Configure Amazon EventBridge to invoke the Lambda function when a new SageMaker AI notebook is initialized.

Question 13

An ML engineer needs to implement a solution to host a trained ML model. The rate of requests to the model will be inconsistent throughout the day.

The ML engineer needs a scalable solution that minimizes costs when the model is not in use. The solution also must maintain the model's capacity to respond to requests during times of peak usage.

Which solution will meet these requirements?

Options:

Create AWS Lambda functions that have fixed concurrency to host the model. Configure the Lambda functions to automatically scale based on the number of requests to the model.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Set a static number of tasks to handle requests during times of peak usage.

Deploy the model to an Amazon SageMaker endpoint. Deploy multiple copies of the model to the endpoint. Create an Application Load Balancer to route traffic between the different copies of the model at the endpoint.

Deploy the model to an Amazon SageMaker endpoint. Create SageMaker endpoint auto scaling policies that are based on Amazon CloudWatch metrics to adjust the number of instances dynamically.

Question 14

A company is developing a customer support AI assistant by using an Amazon Bedrock Retrieval Augmented Generation (RAG) pipeline. The AI assistant retrieves articles from a knowledge base stored in Amazon S3. The company uses Amazon OpenSearch Service to index the knowledge base. The AI assistant uses an Amazon Bedrock Titan Embeddings model for vector search.

The company wants to improve the relevance of the retrieved articles to improve the quality of the AI assistant's answers.

Which solution will meet these requirements?

Options:

Use auto-summarization on the retrieved articles by using Amazon SageMaker JumpStart.

Use a reranker model before passing the articles to the foundation model (FM).

Use Amazon Athena to pre-filter the articles based on metadata before retrieval.

Use Amazon Bedrock Provisioned Throughput to process queries more efficiently.

Question 15

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model's performance?

Options:

Accuracy

Area Under the ROC Curve (AUC)

F1 score

Mean absolute error (MAE)

Question 16

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of the data quality of the models. The ML engineer must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

Options:

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and to send alerts.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and to send alerts.

Deploy the models by using Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Use Amazon EventBridge to monitor the data quality and to send alerts.

Deploy the models by using Amazon SageMaker AI batch transform. Use SageMaker Model Monitor to monitor the data quality and to send alerts.

Question 17

A company has developed a new ML model. The company requires online model validation on 10% of the traffic before the company fully releases the model in production. The company uses an Amazon SageMaker endpoint behind an Application Load Balancer (ALB) to serve the model.

Which solution will set up the required online validation with the LEAST operational overhead?

Options:

Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 0.1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

Use production variants to add the new model to the existing SageMaker endpoint. Set the variant weight to 1 for the new model. Monitor the number of invocations by using Amazon CloudWatch.

Create a new SageMaker endpoint. Use production variants to add the new model to the new endpoint. Monitor the number of invocations by using Amazon CloudWatch.

Configure the ALB to route 10% of the traffic to the new model at the existing SageMaker endpoint. Monitor the number of invocations by using AWS CloudTrail.

Answer:

Explanation:

Scenario: The company wants to perform online validation of a new ML model on 10% of the traffic before fully deploying the model in production. The setup must have minimal operational overhead.

Why Use SageMaker Production Variants?

Built-In Traffic Splitting: Amazon SageMaker endpoints support production variants, allowing multiple models to run on a single endpoint. You can direct a percentage of incoming traffic to each variant by adjusting the variant weights.

Ease of Management: Using production variants eliminates the need for additional infrastructure like separate endpoints or custom ALB configurations.

Monitoring with CloudWatch: SageMaker automatically integrates with CloudWatch, enabling real-time monitoring of model performance and invocation metrics.

Steps to Implement:

Deploy the New Model as a Production Variant:

Update the existing SageMaker endpoint to include the new model as a production variant. This can be done via the SageMaker console, CLI, or SDK.

Example SDK Code:

import boto3

sm_client = boto3.client('sagemaker')

response = sm_client.update_endpoint_weights_and_capacities(

EndpointName='existing-endpoint-name',

DesiredWeightsAndCapacities=[

{'VariantName': 'current-model', 'DesiredWeight': 0.9},

{'VariantName': 'new-model', 'DesiredWeight': 0.1}

]

)

Set the Variant Weight:

Assign a weight of 0.1 to the new model and 0.9 to the existing model. This ensures 10% of traffic goes to the new model while the remaining 90% continues to use the current model.

Monitor the Performance:

Use Amazon CloudWatch metrics, such as InvocationCount and ModelLatency, to monitor the traffic and performance of each variant.

Validate the Results:

Analyze the performance of the new model based on metrics like accuracy, latency, and failure rates.

Why Not the Other Options?

Option B: Setting the weight to 1 directs all traffic to the new model, which does not meet the requirement of splitting traffic for validation.

Option C: Creating a new endpoint introduces additional operational overhead for traffic routing and monitoring, which is unnecessary given SageMaker's built-in production variant capability.

Option D: Configuring the ALB to route traffic requires manual setup and lacks SageMaker's seamless variant monitoring and traffic splitting features.

Conclusion:

Using production variants with a weight of 0.1 for the new model on the existing SageMaker endpoint provides the required traffic split for online validation with minimal operational overhead.

[References:, Amazon SageMaker Endpoints, SageMaker Production Variants, Monitoring SageMaker Endpoints with CloudWatch, , , ]

Question 18

A company needs to analyze a large dataset that is stored in Amazon S3 in Apache Parquet format. The company wants to use one-hot encoding for some of the columns.

The company needs a no-code solution to transform the data. The solution must store the transformed data back to the same S3 bucket for model training.

Which solution will meet these requirements?

Options:

Configure an AWS Glue DataBrew project that connects to the data. Use the DataBrew interactive interface to create a recipe that performs the one-hot encoding transformation. Create a job to apply the transformation and write the output back to an S3 bucket.

Use Amazon Athena SQL queries to perform the one-hot encoding transformation.

Use an AWS Glue ETL interactive notebook to perform the transformation.

Use Amazon Redshift Spectrum to perform the transformation.

Question 19

A company stores time-series data about user clicks in an Amazon S3 bucket. The raw data consists of millions of rows of user activity every day. ML engineers access the data to develop their ML models.

The ML engineers need to generate daily reports and analyze click trends over the past 3 days by using Amazon Athena. The company must retain the data for 30 days before archiving the data.

Which solution will provide the HIGHEST performance for data retrieval?

Options:

Keep all the time-series data without partitioning in the S3 bucket. Manually move data that is older than 30 days to separate S3 buckets.

Create AWS Lambda functions to copy the time-series data into separate S3 buckets. Apply S3 Lifecycle policies to archive data that is older than 30 days to S3 Glacier Flexible Retrieval.

Organize the time-series data into partitions by date prefix in the S3 bucket. Apply S3 Lifecycle policies to archive partitions that are older than 30 days to S3 Glacier Flexible Retrieval.

Put each day's time-series data into its own S3 bucket. Use S3 Lifecycle policies to archive S3 buckets that hold data that is older than 30 days to S3 Glacier Flexible Retrieval.

Question 20

An ML engineer is using an Amazon SageMaker AI shadow test to evaluate a new model that is hosted on a SageMaker AI endpoint. The shadow test requires significant GPU resources for high performance. The production variant currently runs on a less powerful instance type.

The ML engineer needs to configure the shadow test to use a higher performance instance type for a shadow variant. The solution must not affect the instance type of the production variant.

Which solution will meet these requirements?

Options:

Modify the existing ProductionVariant configuration in the endpoint to include a ShadowProductionVariants list. Specify the larger instance type for the shadow variant.

Create a new endpoint configuration with two ProductionVariant definitions. Configure one definition for the existing production variant and one definition for the shadow variant with the larger instance type. Use the UpdateEndpoint action to apply the new configuration.

Create a separate SageMaker AI endpoint for the shadow variant that uses the larger instance type. Create an AWS Lambda function that routes a portion of the traffic to the shadow endpoint. Assign the Lambda function to the original endpoint.

Use the CreateEndpointConfig action to define a new configuration. Specify the existing production variant in the configuration and add a separate ShadowProductionVariants list. Specify the larger instance type for the shadow variant. Use the CreateEndpoint action and pass the new configuration to the endpoint.

Question 21

An advertising company uses AWS Lake Formation to manage a data lake. The data lake contains structured data and unstructured data. The company's ML engineers are assigned to specific advertisement campaigns.

The ML engineers must interact with the data through Amazon Athena and by browsing the data directly in an Amazon S3 bucket. The ML engineers must have access to only the resources that are specific to their assigned advertisement campaigns.

Which solution will meet these requirements in the MOST operationally efficient way?

Options:

Configure IAM policies on an AWS Glue Data Catalog to restrict access to Athena based on the ML engineers' campaigns.

Store users and campaign information in an Amazon DynamoDB table. Configure DynamoDB Streams to invoke an AWS Lambda function to update S3 bucket policies.

Use Lake Formation to authorize AWS Glue to access the S3 bucket. Configure Lake Formation tags to map ML engineers to their campaigns.

Configure S3 bucket policies to restrict access to the S3 bucket based on the ML engineers' campaigns.

Question 22

An ML engineer is using a training job to fine-tune a deep learning model in Amazon SageMaker Studio. The ML engineer previously used the same pre-trained model with a similar

dataset. The ML engineer expects vanishing gradient, underutilized GPU, and overfitting problems.

The ML engineer needs to implement a solution to detect these issues and to react in predefined ways when the issues occur. The solution also must provide comprehensive real-time metrics during the training.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Use TensorBoard to monitor the training job. Publish the findings to an Amazon Simple Notification Service (Amazon SNS) topic. Create an AWS Lambda function to consume the findings and to initiate the predefined actions.

Use Amazon CloudWatch default metrics to gain insights about the training job. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

Expand the metrics in Amazon CloudWatch to include the gradients in each training step. Use the metrics to invoke an AWS Lambda function to initiate the predefined actions.

Use SageMaker Debugger built-in rules to monitor the training job. Configure the rules to initiate the predefined actions.

Question 23

A company regularly receives new training data from the vendor of an ML model. The vendor delivers cleaned and prepared data to the company's Amazon S3 bucket every 3-4 days.

The company has an Amazon SageMaker pipeline to retrain the model. An ML engineer needs to implement a solution to run the pipeline when new data is uploaded to the S3 bucket.

Which solution will meet these requirements with the LEAST operational effort?

Options:

Create an S3 Lifecycle rule to transfer the data to the SageMaker training instance and to initiate training.

Create an AWS Lambda function that scans the S3 bucket. Program the Lambda function to initiate the pipeline when new data is uploaded.

Create an Amazon EventBridge rule that has an event pattern that matches the S3 upload. Configure the pipeline as the target of the rule.

Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the pipeline when new data is uploaded.

Question 24

A company wants to migrate ML models from an on-premises environment to Amazon SageMaker AI. The models are based on the PyTorch algorithm. The company needs to reuse its existing custom scripts as much as possible.

Which SageMaker AI feature should the company use?

Options:

SageMaker AI built-in algorithms

SageMaker Canvas

SageMaker JumpStart

SageMaker AI script mode

Question 25

A healthcare company wants to detect irregularities in patient vital signs that could indicate early signs of a medical condition. The company has an unlabeled dataset that includes patient health records, medication history, and lifestyle changes.

Which algorithm and hyperparameter should the company use to meet this requirement?

Options:

Use the Amazon SageMaker AI XGBoost algorithm. Set max_depth to greater than 100 to regulate tree complexity.

Use the Amazon SageMaker AI k-means clustering algorithm. Set k to determine the number of clusters.

Use the Amazon SageMaker AI DeepAR algorithm. Set epochs to the number of training iterations.

Use the Amazon SageMaker AI Random Cut Forest (RCF) algorithm. Set num_trees to greater than 100.

Question 26

An ML engineer is building a model to predict house and apartment prices. The model uses three features: Square Meters, Price, and Age of Building. The dataset has 10,000 data rows. The data includes data points for one large mansion and one extremely small apartment.

The ML engineer must perform preprocessing on the dataset to ensure that the model produces accurate predictions for the typical house or apartment.

Which solution will meet these requirements?

Options:

Remove the outliers and perform a log transformation on the Square Meters variable.

Keep the outliers and perform normalization on the Square Meters variable.

Remove the outliers and perform one-hot encoding on the Square Meters variable.

Keep the outliers and perform one-hot encoding on the Square Meters variable.

Question 27

A company wants to use large language models (LLMs) supported by Amazon Bedrock to develop a chat interface for internal technical documentation.

The documentation consists of dozens of text files totaling several megabytes and is updated frequently.

Which solution will meet these requirements MOST cost-effectively?

Options:

Train a new LLM in Amazon Bedrock using the documentation.

Use Amazon Bedrock guardrails to integrate documentation.

Fine-tune an LLM in Amazon Bedrock with the documentation.

Upload the documentation to an Amazon Bedrock knowledge base and use it as context during inference.

Question 28

An ML engineer trained an ML model on Amazon SageMaker to detect automobile accidents from dosed-circuit TV footage. The ML engineer used SageMaker Data Wrangler to create a training dataset of images of accidents and non-accidents.

The model performed well during training and validation. However, the model is underperforming in production because of variations in the quality of the images from various cameras.

Which solution will improve the model's accuracy in the LEAST amount of time?

Options:

Collect more images from all the cameras. Use Data Wrangler to prepare a new training dataset.

Recreate the training dataset by using the Data Wrangler corrupt image transform. Specify the impulse noise option.

Recreate the training dataset by using the Data Wrangler enhance image contrast transform. Specify the Gamma contrast option.

Recreate the training dataset by using the Data Wrangler resize image transform. Crop all images to the same size.

Question 29

A company stores training data as a .csv file in an Amazon S3 bucket. The company must encrypt the data and must control which applications have access to the encryption key.

Which solution will meet these requirements?

Options:

Create a new SSH access key and use the AWS Encryption CLI to encrypt the file.

Create a new API key by using Amazon API Gateway and use it to encrypt the file.

Create a new IAM role with permissions for kms:GenerateDataKey and use the role to encrypt the file.

Create a new AWS Key Management Service (AWS KMS) key and use the AWS Encryption CLI with the KMS key to encrypt the file.

Question 30

A company has deployed a model to predict the churn rate for its games by using Amazon SageMaker Studio. After the model is deployed, the company must monitor the model performance for data drift and inspect the report. Select and order the correct steps from the following list to model monitor actions. Select each step one time. (Select and order THREE.) .

Check the analysis results on the SageMaker Studio console. .

Create a Shapley Additive Explanations (SHAP) baseline for the model by using Amazon SageMaker Clarify.

Schedule an hourly model explainability monitor.

Options:

Question 31

Case study

The training dataset includes categorical data and numerical data. The ML engineer must prepare the training dataset to maximize the accuracy of the model.

Which action will meet this requirement with the LEAST operational overhead?

Options:

Use AWS Glue to transform the categorical data into numerical data.

Use AWS Glue to transform the numerical data into categorical data.

Use Amazon SageMaker Data Wrangler to transform the categorical data into numerical data.

Use Amazon SageMaker Data Wrangler to transform the numerical data into categorical data.

Answer:

Explanation:

Preparing a training dataset that includes both categorical and numerical data is essential for maximizing the accuracy of a machine learning model. Transforming categorical data into numerical format is a critical step, as most ML algorithms require numerical input.

Why Transform Categorical Data into Numerical Data?

Model Compatibility: Many ML algorithms cannot process categorical data directly and require numerical representations.

Improved Performance: Proper encoding of categorical variables can enhance model accuracy and convergence speed.

Why Use Amazon SageMaker Data Wrangler?

Amazon SageMaker Data Wrangler offers a visual interface with over 300 built-in data transformations, including tools for encoding categorical variables.

Implementation Steps:

Import Data:

Load the dataset into SageMaker Data Wrangler from sources like Amazon S3 or on-premises databases.

Identify Categorical Features:

Use Data Wrangler's data type inference to detect categorical columns.

Apply Categorical Encoding:

Choose appropriate encoding techniques (e.g., one-hot encoding or ordinal encoding) from Data Wrangler's transformation options.

Apply the selected transformation to convert categorical features into numerical format.

Validate Transformations:

Review the transformed dataset to ensure accuracy and completeness.

Advantages of Using SageMaker Data Wrangler:

Ease of Use: Provides a user-friendly interface for data transformation without extensive coding.

Operational Efficiency: Integrates data preparation steps, reducing the need for multiple tools and minimizing operational overhead.

Flexibility: Supports various data sources and transformation techniques, accommodating diverse datasets.

By utilizing SageMaker Data Wrangler to transform categorical data into numerical format, the ML engineer can efficiently prepare the dataset, thereby enhancing the model's accuracy with minimal operational overhead.

Transform Data - Amazon SageMaker

Prepare ML Data with Amazon SageMaker Data Wrangler

Question 32

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of data quality for the models and must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

Options:

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and send alerts.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and send alerts.

Deploy the models by using Amazon ECS on AWS Fargate. Use Amazon EventBridge to monitor the data quality and send alerts.

Deploy the models by using Amazon SageMaker AI batch transform. Use SageMaker Model Monitor to monitor the data quality and send alerts.

Question 33

An ML engineer is developing a classification model. The ML engineer needs to use custom libraries in processing jobs, training jobs, and pipelines in Amazon SageMaker AI.

Which solution will provide this functionality with the LEAST implementation effort?

Options:

Manually install the libraries in the SageMaker AI containers.

Build a custom Docker container that includes the required libraries. Host the container in Amazon Elastic Container Registry (Amazon ECR). Use the ECR image in the SageMaker AI jobs and pipelines.

Use a SageMaker AI notebook instance and install libraries at startup.

Run code externally on Amazon EC2 and import results into SageMaker AI.

Question 34

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company's main competitor.

Which solution will meet this requirement?

Options:

Configure the competitor's name as a blocked phrase in Amazon Q Business.

Configure an Amazon Q Business retriever to exclude the competitor’s name.

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor's name.

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor's name.

Question 35

An ML model is deployed in production. The model has performed well and has met its metric thresholds for months.

An ML engineer who is monitoring the model observes a sudden degradation. The performance metrics of the model are now below the thresholds.

What could be the cause of the performance degradation?

Options:

Lack of training data

Drift in production data distribution

Compute resource constraints

Model overfitting

Question 36

A company has significantly increased the amount of data that is stored as .csv files in an Amazon S3 bucket. Data transformation scripts and queries are now taking much longer than they used to take.

An ML engineer must implement a solution to optimize the data for query performance.

Which solution will meet this requirement with the LEAST operational overhead?

Options:

Configure an AWS Lambda function to split the .csv files into smaller objects in the S3 bucket.

Configure an AWS Glue job to drop columns that have string type values and to save the results to the S3 bucket.

Configure an AWS Glue extract, transform, and load (ETL) job to convert the .csv files to Apache Parquet format.

Configure an Amazon EMR cluster to process the data that is in the S3 bucket.

Question 37

A company that has hundreds of data scientists is using Amazon SageMaker to create ML models. The models are in model groups in the SageMaker Model Registry.

The data scientists are grouped into three categories: computer vision, natural language processing (NLP), and speech recognition. An ML engineer needs to implement a solution to organize the existing models into these groups to improve model discoverability at scale. The solution must not affect the integrity of the model artifacts and their existing groupings.

Which solution will meet these requirements?

Options:

Create a custom tag for each of the three categories. Add the tags to the model packages in the SageMaker Model Registry.

Create a model group for each category. Move the existing models into these category model groups.

Use SageMaker ML Lineage Tracking to automatically identify and tag which model groups should contain the models.

Create a Model Registry collection for each of the three categories. Move the existing model groups into the collections.

Question 38

An ML engineer needs to create data ingestion pipelines and ML model deployment pipelines on AWS. All the raw data is stored in Amazon S3 buckets.

Which solution will meet these requirements?

Options:

Use Amazon Data Firehose to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

Use AWS Glue to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

Use Amazon Redshift ML to create the data ingestion pipelines. Use Amazon SageMaker Studio Classic to create the model deployment pipelines.

Use Amazon Athena to create the data ingestion pipelines. Use an Amazon SageMaker notebook to create the model deployment pipelines.

Question 39

An ML engineer is setting up a CI/CD pipeline for an ML workflow in Amazon SageMaker AI. The pipeline must automatically retrain, test, and deploy a model whenever new data is uploaded to an Amazon S3 bucket. New data files are approximately 10 GB in size. The ML engineer also needs to track model versions for auditing.

Which solution will meet these requirements?

Options:

Use AWS CodePipeline, Amazon S3, and AWS CodeBuild to retrain and deploy the model automatically and track model versions.

Use SageMaker Pipelines with the SageMaker Model Registry to orchestrate model training and version tracking.

Use AWS Lambda and Amazon EventBridge to retrain and deploy the model and track versions via logs.

Manually retrain and deploy the model using SageMaker notebook instances and track versions with AWS CloudTrail.

Question 40

A company uses a hybrid cloud environment. A model that is deployed on premises uses data in Amazon 53 to provide customers with a live conversational engine.

The model is using sensitive data. An ML engineer needs to implement a solution to identify and remove the sensitive data.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Deploy the model on Amazon SageMaker. Create a set of AWS Lambda functions to identify and remove the sensitive data.

Deploy the model on an Amazon Elastic Container Service (Amazon ECS) cluster that uses AWS Fargate. Create an AWS Batch job to identify and remove the sensitive data.

Use Amazon Macie to identify the sensitive data. Create a set of AWS Lambda functions to remove the sensitive data.

Use Amazon Comprehend to identify the sensitive data. Launch Amazon EC2 instances to remove the sensitive data.

Question 41

A company is building a conversational AI assistant on Amazon Bedrock. The company is using Retrieval Augmented Generation (RAG) to reference the company's internal knowledge base. The AI assistant uses the Anthropic Claude 4 foundation model (FM).

The company needs a solution that uses a vector embedding model, a vector store, and a vector search algorithm.

Which solution will develop the AI assistant with the LEAST development effort?

Options:

Use Amazon Kendra Experience Builder.

Use Amazon Aurora PostgreSQL with the pgvector extension.

Use Amazon RDS for PostgreSQL with the pgvector extension.

Use the AWS Glue Data Catalog metadata repository.

Question 42

An ML engineer needs to deploy a trained model based on a genetic algorithm. Predictions can take several minutes, and requests can include up to 100 MB of data.

Which deployment solution will meet these requirements with the LEAST operational overhead?

Options:

Deploy on EC2 Auto Scaling behind an ALB.

Deploy to a SageMaker AI real-time endpoint.

Deploy to a SageMaker AI Asynchronous Inference endpoint.

Deploy to Amazon ECS on EC2.

Question 43

A credit card company has a fraud detection model in production on an Amazon SageMaker endpoint. The company develops a new version of the model. The company needs to assess the new model's performance by using live data and without affecting production end users.

Which solution will meet these requirements?

Options:

Set up SageMaker Debugger and create a custom rule.

Set up blue/green deployments with all-at-once traffic shifting.

Set up blue/green deployments with canary traffic shifting.

Set up shadow testing with a shadow variant of the new model.

Question 44

A company wants to develop an ML model by using tabular data from its customers. The data contains meaningful ordered features with sensitive information that should not be discarded. An ML engineer must ensure that the sensitive data is masked before another team starts to build the model.

Which solution will meet these requirements?

Options:

Use Amazon Made to categorize the sensitive data.

Prepare the data by using AWS Glue DataBrew.

Run an AWS Batch job to change the sensitive data to random values.

Run an Amazon EMR job to change the sensitive data to random values.

Question 45

An ML engineer is developing a fraud detection model by using the Amazon SageMaker XGBoost algorithm. The model classifies transactions as either fraudulent or legitimate.

During testing, the model excels at identifying fraud in the training dataset. However, the model is inefficient at identifying fraud in new and unseen transactions.

What should the ML engineer do to improve the fraud detection for new transactions?

Options:

Increase the learning rate.

Remove some irrelevant features from the training dataset.

Increase the value of the max_depth hyperparameter.

Decrease the value of the max_depth hyperparameter.

Question 46

An ML engineer at a credit card company built and deployed an ML model by using Amazon SageMaker AI. The model was trained on transaction data that contained very few fraudulent transactions. After deployment, the model is underperforming.

What should the ML engineer do to improve the model’s performance?

Options:

Retrain the model with a different SageMaker built-in algorithm.

Use random undersampling to reduce the majority class and retrain the model.

Use Synthetic Minority Oversampling Technique (SMOTE) to generate synthetic minority samples and retrain the model.

Use random oversampling to duplicate minority samples and retrain the model.

Question 47

Which solution will meet this requirement?

Options:

Configure the competitor's name as a blocked phrase in Amazon Q Business.

Configure an Amazon Q Business retriever to exclude the competitor's name.

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor's name.

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor's name.

Question 48

A company has used Amazon SageMaker to deploy a predictive ML model in production. The company is using SageMaker Model Monitor on the model. After a model update, an ML engineer notices data quality issues in the Model Monitor checks.

What should the ML engineer do to mitigate the data quality issues that Model Monitor has identified?

Options:

Adjust the model's parameters and hyperparameters.

Initiate a manual Model Monitor job that uses the most recent production data.

Create a new baseline from the latest dataset. Update Model Monitor to use the new baseline for evaluations.

Include additional data in the existing training set for the model. Retrain and redeploy the model.

Question 49

A company has an ML model that is deployed to an Amazon SageMaker AI endpoint for real-time inference. The company needs to deploy a new model. The company must compare the new model’s performance to the currently deployed model's performance before shifting all traffic to the new model.

Which solution will meet these requirements with the LEAST operational effort?

Options:

Deploy the new model to a separate endpoint. Manually split traffic between the two endpoints.

Deploy the new model to a separate endpoint. Use Amazon CloudFront to distribute traffic between the two endpoints.

Deploy the new model as a shadow variant on the same endpoint as the current model. Route a portion of live traffic to the shadow model for evaluation.

Use AWS Lambda functions with custom logic to route traffic between the current model and the new model.

Question 50

An ML engineer is building a logistic regression model to predict customer churn for subscription services. The dataset contains two string variables: location and job_seniority_level.

The location variable has 3 distinct values, and the job_seniority_level variable has over 10 distinct values.

The ML engineer must perform preprocessing on the variables.

Which solution will meet this requirement?

Options:

Apply tokenization to location. Apply ordinal encoding to job_seniority_level.

Apply one-hot encoding to location. Apply ordinal encoding to job_seniority_level.

Apply binning to location. Apply standard scaling to job_seniority_level.

Apply one-hot encoding to location. Apply standard scaling to job_seniority_level.

Question 51

An ML engineer has an Amazon Comprehend custom model in Account A in the us-east-1 Region. The ML engineer needs to copy the model to Account В in the same Region.

Which solution will meet this requirement with the LEAST development effort?

Options:

Use Amazon S3 to make a copy of the model. Transfer the copy to Account B.

Create a resource-based IAM policy. Use the Amazon Comprehend ImportModel API operation to copy the model to Account B.

Use AWS DataSync to replicate the model from Account A to Account B.

Create an AWS Site-to-Site VPN connection between Account A and Account В to transfer the model.

Question 52

A company wants to improve the sustainability of its ML operations.

Which actions will reduce the energy usage and computational resources that are associated with the company's training jobs? (Choose two.)

Options:

Use Amazon SageMaker Debugger to stop training jobs when non-converging conditions are detected.

Use Amazon SageMaker Ground Truth for data labeling.

Deploy models by using AWS Lambda functions.

Use AWS Trainium instances for training.

Use PyTorch or TensorFlow with the distributed training option.

Question 53

A digital media entertainment company needs real-time video content moderation to ensure compliance during live streaming events.

Which solution will meet these requirements with the LEAST operational overhead?

Options:

Use Amazon Rekognition and AWS Lambda to extract and analyze the metadata from the videos' image frames.

Use Amazon Rekognition and a large language model (LLM) hosted on Amazon Bedrock to extract and analyze the metadata from the videos’ image frames.

Use Amazon SageMaker AI to extract and analyze the metadata from the videos' image frames.

Use Amazon Transcribe and Amazon Comprehend to extract and analyze the metadata from the videos' image frames.

Question 54

A company's ML engineer is creating a classification model. The ML engineer explores the dataset and notices a column named day_of_week. The column contains the following values: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday.

Which technique should the ML engineer use to convert this column’s data to binary values?

Options:

Binary encoding

Label encoding

One-hot encoding

Tokenization

Question 55

An ML engineer needs to use an Amazon EMR cluster to process large volumes of data in batches. Any data loss is unacceptable.

Which instance purchasing option will meet these requirements MOST cost-effectively?

Options:

Run the primary node, core nodes, and task nodes on On-Demand Instances.

Run the primary node, core nodes, and task nodes on Spot Instances.

Run the primary node on an On-Demand Instance. Run the core nodes and task nodes on Spot Instances.

Run the primary node and core nodes on On-Demand Instances. Run the task nodes on Spot Instances.

Question 56

An ML engineer wants to use Amazon SageMaker Data Wrangler to perform preprocessing on a dataset. The ML engineer wants to use the processed dataset to train a classification model. During preprocessing, the ML engineer notices that a text feature has a range of thousands of values that differ only by spelling errors. The ML engineer needs to apply an encoding method so that after preprocessing is complete, the text feature can be used to train the model.

Which solution will meet these requirements?

Options:

Perform ordinal encoding to represent categories of the feature.

Perform similarity encoding to represent categories of the feature.

Perform one-hot encoding to represent categories of the feature.

Perform target encoding to represent categories of the feature.

Question 57

An ML engineer has developed a binary classification model outside of Amazon SageMaker. The ML engineer needs to make the model accessible to a SageMaker Canvas user for additional tuning.

The model artifacts are stored in an Amazon S3 bucket. The ML engineer and the Canvas user are part of the same SageMaker domain.

Which combination of requirements must be met so that the ML engineer can share the model with the Canvas user? (Choose two.)

Options:

The ML engineer and the Canvas user must be in separate SageMaker domains.

The Canvas user must have permissions to access the S3 bucket where the model artifacts are stored.

The model must be registered in the SageMaker Model Registry.

The ML engineer must host the model on AWS Marketplace.

The ML engineer must deploy the model to a SageMaker endpoint.

Question 58

A travel company has trained hundreds of geographic data models to answer customer questions by using Amazon SageMaker AI. Each model uses its own inferencing endpoint, which has become an operational challenge for the company.

The company wants to consolidate the models' inferencing endpoints to reduce operational overhead.

Which solution will meet these requirements?

Options:

Use SageMaker AI multi-model endpoints. Deploy a single endpoint.

Use SageMaker AI multi-container endpoints. Deploy a single endpoint.

Use Amazon SageMaker Studio. Deploy a single-model endpoint.

Use inference pipelines in SageMaker AI to combine tasks from hundreds of models to 15 models.

Question 59

An ML engineer is training an XGBoost regression model in Amazon SageMaker AI. The ML engineer conducts several rounds of hyperparameter tuning with random grid search. After these rounds of tuning, the error rate on the test hold-out dataset is much larger than the error rate on the training dataset.

The ML engineer needs to make changes before running the hyperparameter grid search again.

Which changes will improve the model's performance? (Select TWO.)

Options:

Increase the model complexity by increasing the number of features in the dataset.

Decrease the model complexity by reducing the number of features in the dataset.

Decrease the model complexity by reducing the number of samples in the dataset.

Increase the value of the L2 regularization parameter.

Decrease the value of the L2 regularization parameter.

Question 60

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records per second.

The company needs a scalable AWS solution to identify anomalous data points with the LEAST operational overhead.

Which solution will meet these requirements?

Options:

Ingest data into Amazon Kinesis Data Streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to detect anomalies.

Ingest data into Kinesis Data Streams. Deploy a SageMaker AI endpoint and use AWS Lambda to detect anomalies.

Ingest data into Apache Kafka on Amazon EC2 and use SageMaker AI for detection.

Send data to Amazon SQS and use AWS Glue ETL jobs for batch anomaly detection.

Question 61

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a retraining job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

Options:

Use an AWS Glue crawler and an AWS Glue ETL job to detect data drift. Use AWS Glue triggers to automate the retraining job.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the retraining job.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the retraining job.

Use Amazon QuickSight anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the retraining job.

Question 62

A company runs an Amazon SageMaker domain in a public subnet of a newly created VPC. The network is configured properly, and ML engineers can access the SageMaker domain.

Recently, the company discovered suspicious traffic to the domain from a specific IP address. The company needs to block traffic from the specific IP address.

Which update to the network configuration will meet this requirement?

Options:

Create a security group inbound rule to deny traffic from the specific IP address. Assign the security group to the domain.

Create a network ACL inbound rule to deny traffic from the specific IP address. Assign the rule to the default network Ad for the subnet where the domain is located.

Create a shadow variant for the domain. Configure SageMaker Inference Recommender to send traffic from the specific IP address to the shadow endpoint.

Create a VPC route table to deny inbound traffic from the specific IP address. Assign the route table to the domain.

Load More MLA-C01 Questions

Demo: 62 questions
Total 230 questions

Get MLA-C01 Full Access Download MLA-C01 PDF

Pre-Summer Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70percent

Amazon Web Services MLA-C01 AWS Certified Machine Learning Engineer - Associate Exam Practice Test

AWS Certified Machine Learning Engineer - Associate Questions and Answers

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation:

Options:

Answer:

Explanation: