Weekend Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70percent

CompTIA DY0-001 CompTIA DataX Exam Exam Practice Test

Demo: 25 questions
Total 85 questions

CompTIA DataX Exam Questions and Answers

Question 1

A data scientist is analyzing a data set with categorical features and would like to make those features more useful when building a model. Which of the following data transformation techniques should the data scientist use? (Choose two.)

Options:

A.

Normalization

B.

One-hot encoding

C.

Linearization

D.

Label encoding

E.

Scaling

F.

Pivoting

Question 2

A data scientist is building a forecasting model for the price of copper. The only input in this model is the daily price of copper for the last ten years. Which of the following forecasting techniques is the most appropriate for the data scientist to use?

Options:

A.

Autoregressive

B.

Moving average

C.

Dynamic time warping

D.

Relative strength

Question 3

The following graphic shows the results of an unsupervised, machine-learning clustering model:

k is the number of clusters, and n is the processing time required to run the model. Which of the following is the best value of k to optimize both accuracy and processing requirements?

Options:

A.

2

B.

10

C.

15

D.

20

Question 4

Which of the following explains back propagation?

Options:

A.

The passage of convolutions backward through a neural network to update weights and biases

B.

The passage of accuracy backward through a neural network to update weights and biases

C.

The passage of nodes backward through a neural network to update weights and biases

D.

The passage of errors backward through a neural network to update weights and biases

Question 5

In a modeling project, people evaluate phrases and provide reactions as the target variable for the model. Which of the following best describes what this model is doing?

Options:

A.

Sentiment analysis

B.

Named-entity recognition

C.

TF-IDF vectorization

D.

Part-of-speech tagging

Question 6

A data scientist is developing a model to predict the outcome of a vote for a national mascot. The choice is between tigers and lions. The full data set represents feedback from individuals representing 17 professions and 12 different locations. The following rank aggregation represents 80% of the data set:

(Screenshot shows survey rankings for just two professions and a few locations, all voting for "Tigers")

Which of the following is the most likely concern about the model's ability to predict the outcome of the vote?

Options:

A.

Interpolated data

B.

Extrapolated data

C.

In-sample data

D.

Out-of-sample data

Question 7

Which of the following environmental changes is most likely to resolve a memory constraint error when running a complex model using distributed computing?

Options:

A.

Converting an on-premises deployment to a containerized deployment

B.

Migrating to a cloud deployment

C.

Moving model processing to an edge deployment

D.

Adding nodes to a cluster deployment

Question 8

Which of the following layer sets includes the minimum three layers required to constitute an artificial neural network?

Options:

A.

An input layer, a pooling layer, and an output layer

B.

An input layer, a convolutional layer, and a hidden layer

C.

An input layer, a hidden layer, and an output layer

D.

An input layer, a dropout layer, and a hidden layer

Question 9

A data analyst is examining the correlation matrix of a new data set to identify issues that could adversely impact model performance. Which of the following is the analyst most likely checking for?

Options:

A.

Undersampling

B.

Multicollinearity

C.

Oversampling

D.

Overfitting

Question 10

Given matrix

Which of the following is AT?

Options:

A.

B.

C.

D.

Question 11

Which of the following techniques enables automation and iteration of code releases?

Options:

A.

Virtualization

B.

Markdown

C.

Code isolation

D.

CI/CD

Question 12

Which of the following describes the appropriate use case for PCA?

Options:

A.

Dimensionality reduction

B.

Classification

C.

Regression

D.

Recommendation

Question 13

A data scientist is deploying a model that needs to be accessed by multiple departments with minimal development effort by the departments. Which of the following APIs would be best for the data scientist to use?

Options:

A.

SOAP

B.

RPC

C.

JSON

D.

REST

Question 14

Which of the following does k represent in the k-means model?

Options:

A.

Number of model tests

B.

Number of data splits

C.

Number of clusters

D.

Distance between features

Question 15

A team is building a spam detection system. The team wants a probability-based identification method without complex, in-depth training from the historical data set. Which of the following methods would best serve this purpose?

Options:

A.

Logistic regression

B.

Random forest

C.

Naive Bayes

D.

Linear regression

Question 16

Which of the following is best solved with graph theory?

Options:

A.

Optical character recognition

B.

Traveling salesman

C.

Fraud detection

D.

One-armed bandit

Question 17

Which of the following measures would a data scientist most likely use to calculate the similarity of two text strings?

Options:

A.

Word cloud

B.

Edit distance

C.

String indexing

D.

k-nearest neighbors

Question 18

Which of the following issues should a data scientist be most concerned about when generating a synthetic data set?

Options:

A.

The data set consuming too many resources

B.

The data set having insufficient features

C.

The data set having insufficient row observations

D.

The data set not being representative of the population

Question 19

A data scientist trained a model for departments to share. The departments must access the model using HTTP requests. Which of the following approaches is appropriate?

Options:

A.

Utilize distributed computing.

B.

Deploy containers.

C.

Create an endpoint.

D.

Use the File Transfer Protocol.

Question 20

A data analyst wants to find the latitude and longitude of a mailing address. Which of the following is the best method to use?

Options:

A.

One-hot encoding

B.

Binning

C.

Geocoding

D.

Imputing

Question 21

A data scientist is building an inferential model with a single predictor variable. A scatter plot of the independent variable against the real-number dependent variable shows a strong relationship between them. The predictor variable is normally distributed with very few outliers. Which of the following algorithms is the best fit for this model, given the data scientist wants the model to be easily interpreted?

Options:

A.

A logistic regression

B.

An exponential regression

C.

A linear regression

D.

A probit regression

Question 22

Which of the following methods should a data scientist use just before switching to a potential replacement model?

Options:

A.

A/B testing

B.

Performance monitoring

C.

CI/CD

D.

Containerization

Question 23

Which of the following modeling tools is appropriate for solving a scheduling problem?

Options:

A.

One-armed bandit

B.

Constrained optimization

C.

Decision tree

D.

Gradient descent

Question 24

A data scientist wants to predict a person's travel destination. The options are:

    Branson, Missouri, United States

    Mount Kilimanjaro, Tanzania

    Disneyland Paris, Paris, France

    Sydney Opera House, Sydney, Australia

Which of the following models would best fit this use case?

Options:

A.

Linear discriminant analysis

B.

k-means modeling

C.

Latent semantic analysis

D.

Principal component analysis

Question 25

A data scientist is using the following confusion matrix to assess model performance:

Actually Fails

Actually Succeeds

Predicted to Fail

80%

20%

Predicted to Succeed

15%

85%

The model is predicting whether a delivery truck will be able to make 200 scheduled delivery stops.

Every time the model is correct, the company saves 1 hour in planning and scheduling.

Every time the model is wrong, the company loses 4 hours of delivery time.

Which of the following is the net model impact for the company?

Options:

A.

25 hours lost

B.

25 hours saved

C.

165 hours lost

D.

165 hours saved

Demo: 25 questions
Total 85 questions