A data scientist is analyzing a data set with categorical features and would like to make those features more useful when building a model. Which of the following data transformation techniques should the data scientist use? (Choose two.)
A data scientist is building a forecasting model for the price of copper. The only input in this model is the daily price of copper for the last ten years. Which of the following forecasting techniques is the most appropriate for the data scientist to use?
The following graphic shows the results of an unsupervised, machine-learning clustering model:
k is the number of clusters, and n is the processing time required to run the model. Which of the following is the best value of k to optimize both accuracy and processing requirements?
Which of the following explains back propagation?
In a modeling project, people evaluate phrases and provide reactions as the target variable for the model. Which of the following best describes what this model is doing?
A data scientist is developing a model to predict the outcome of a vote for a national mascot. The choice is between tigers and lions. The full data set represents feedback from individuals representing 17 professions and 12 different locations. The following rank aggregation represents 80% of the data set:
(Screenshot shows survey rankings for just two professions and a few locations, all voting for "Tigers")
Which of the following is the most likely concern about the model's ability to predict the outcome of the vote?
Which of the following environmental changes is most likely to resolve a memory constraint error when running a complex model using distributed computing?
Which of the following layer sets includes the minimum three layers required to constitute an artificial neural network?
A data analyst is examining the correlation matrix of a new data set to identify issues that could adversely impact model performance. Which of the following is the analyst most likely checking for?
Given matrix
Which of the following is AT?
Which of the following techniques enables automation and iteration of code releases?
Which of the following describes the appropriate use case for PCA?
A data scientist is deploying a model that needs to be accessed by multiple departments with minimal development effort by the departments. Which of the following APIs would be best for the data scientist to use?
Which of the following does k represent in the k-means model?
A team is building a spam detection system. The team wants a probability-based identification method without complex, in-depth training from the historical data set. Which of the following methods would best serve this purpose?
Which of the following is best solved with graph theory?
Which of the following measures would a data scientist most likely use to calculate the similarity of two text strings?
Which of the following issues should a data scientist be most concerned about when generating a synthetic data set?
A data scientist trained a model for departments to share. The departments must access the model using HTTP requests. Which of the following approaches is appropriate?
A data analyst wants to find the latitude and longitude of a mailing address. Which of the following is the best method to use?
A data scientist is building an inferential model with a single predictor variable. A scatter plot of the independent variable against the real-number dependent variable shows a strong relationship between them. The predictor variable is normally distributed with very few outliers. Which of the following algorithms is the best fit for this model, given the data scientist wants the model to be easily interpreted?
Which of the following methods should a data scientist use just before switching to a potential replacement model?
Which of the following modeling tools is appropriate for solving a scheduling problem?
A data scientist wants to predict a person's travel destination. The options are:
Branson, Missouri, United States
Mount Kilimanjaro, Tanzania
Disneyland Paris, Paris, France
Sydney Opera House, Sydney, Australia
Which of the following models would best fit this use case?
A data scientist is using the following confusion matrix to assess model performance:
Actually Fails
Actually Succeeds
Predicted to Fail
80%
20%
Predicted to Succeed
15%
85%
The model is predicting whether a delivery truck will be able to make 200 scheduled delivery stops.
Every time the model is correct, the company saves 1 hour in planning and scheduling.
Every time the model is wrong, the company loses 4 hours of delivery time.
Which of the following is the net model impact for the company?