Question 1

Drag the adjustment formulas for oversamping from the left and place them into the correct location in the confusion matrix shown on the right.

Question 2

Within PROC GLM, the interaction between the two categorical predictors, Income and Gender, was shown to be significant. An item store was saved from the GLM analysis.

Which statement from PROC PLM would test the significance of Gender within each level of Income and adjust for multiple tests?

Question 3

What is a benefit to performing data cleansing (imputation, transformations, etc.) on data after partitioning the data for honest assessment as opposed to performing the data cleansing prior to partitioning the data?

Question 4

Refer to the lift chart:

What does the reference line at lift = 1 corresponds to?

Question 5

Refer to the confusion matrix:

An analyst determines that loan defaults occur at the rate of 3% in the overall population. The above confusion matrix is from an oversampled test set (1 = default).

What is the sensitivity adjusted for the population event probability?

Enter your answer in the space below. Round to three decimals (example: n.nnn).

Question 6

An analyst fits a logistic regression model to predict whether or not a client will default on a loan. One of the predictors in the model is agent, and each agent serves 15-20 clients each. The model fails to converge. The analyst prints the summarized data, showing the number of defaulted loans per agent. See the partial output below:

What is the most likely reason that the model fails to converge?

Question 7

Which method is NOT an appropriate way to score new observations with a known target in a logistic regression model?

Question 8

Refer to the exhibit:

SAS output from the RSQUARE selection method, within the REG procedure, is shown. The top two models in each subset are given.

Based on the AIC statistic, which model is the champion model?

Question 9

Refer to the exhibit:

On the Gains Chart, what is the correct interpretation of the horizontal reference line?

Question 10

A marketing manager attempts to determine those customers most likely to purchase additional products as the result of a nation-wide marketing campaign.

The manager possesses a historical dataset (CAMPAIGN) of a similar campaign from last year.

It has the following characteristics:

- Target variable Respond (0, 1)
- Continuous predictor Income
- Categorical predictor Homeowner(Y, N)

Which SAS program performs this analysis?

Question 11

Assume a $10 cost for soliciting a non-responder and a $200 profit for soliciting a responder. The logistic regression model gives a probability score named P_R on a SAS data set called VALID. The VALID data set contains the responder variable Pinch, a 1/0 variable coded as 1 for responder. Customers will be solicited when their probability score is more than 0.05.

Which SAS program computes the profit for each customer in the data set VALID?

Question 12

Refer to the exhibit:

An analyst examined logistic regression models for predicting whether a customer would make a purchase. The ROC curve displayed summarizes the models. Using the selected model and the analyst's decision rule, 25% of the customers who did not make a purchase are incorrectly classified as purchasers.

What can be concluded from the graph?

Question 13

A researcher is planning a logistic regression to model the probability of disease occurrence. The researcher determines the rate of disease occurrence in the population is 1%.

For which of the following would this study be a candidate?

Question 14

Which characteristic of Studentized residuals indicate potential outliers?