Month End Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70percent

EMC D-DS-FN-23 Dell Data Science Foundations Exam Practice Test

Demo: 17 questions
Total 59 questions

Dell Data Science Foundations Questions and Answers

Question 1

What characterizes the Hadoop Distributed File System?

Options:

A.

Peer to peer system designed to run on custom designed hardware

B.

Peer to peer system designed to run on commodity hardware

C.

Master/ slave system designed to run on custom designed hardware

D.

Master/ slave system designed to run on commodity hardware

Question 2

When building a K-means clustering model, you notice that the clusters did not segment on variables that you expected. What should you do?

Options:

A.

Decrease the value of K

B.

Multiply each variable by its standard deviation

C.

Add the WSS to each variable

D.

Check that the data was properly scaled

Question 3

What converts SQL-like commands into either Tez, Spark, or MapReduce jobs that are submitted to the Hadoop cluster?

Options:

A.

Pig

B.

HBase

C.

Hive

D.

Mahout

Question 4

Which SQL OLAP grouping extension returns a result for each output row with 1 identifying a summary row and 0 identifying grouped rows?

Options:

A.

CUBE

B.

GROUPING

C.

GROUP ID

D.

ROLLUP

Question 5

What is a benefit of Spark in-memory data processing as opposed to using MapReduce?

Options:

A.

Avoids writing intermediate data to disk, which speeds up processing

B.

Supports processing unstructured data, which MapReduce does not allow

C.

Removes the need to use disks at all, which reduces cost

D.

Allows parallel processing, which MapReduce does not support

Question 6

Refer to the exhibit, which shows pairwise counts for items purchased together.

Consider the following association rule: Milk -> Eggs

What is value of the lift?

Options:

A.

1.18

B.

0.264

C.

120

D.

70.81

Question 7

Which component of a final presentation provides a succinct overview of the business situation that was the impetus to initiate the project?

Options:

A.

Model description

B.

Approach

C.

Project goals

D.

Recommendations

Question 8

After running a density plot you realize that the data has a long tail to the right. What can you do to make the dataset more normally distributed?

Options:

A.

Use a scatter plot to obtain a better picture

B.

Use a histogram to obtain a better picture

C.

Apply a square transformation

D.

Apply a logarithmic transformation

Question 9

You have been given a task to improve sales force compensation of your organization. As a result of a study, your team decides to classify personnel as follows:

● Did not meet quota

● Met quota

● Exceeded 150% of quota

In which data analytics lifecycle phase should you define these categories for analysis purposes?

Options:

A.

Model building

B.

Communicate results

C.

Operationalize

D.

Model planning

Question 10

You have the data from a popular e-commerce website. You are exploring the time spent (in seconds) on the website by 100,000 customers across 14 different product categories.

What visualization can be used to represent the relationship between time spent and product category?

Options:

A.

Rug plot

B.

Scatter plot

C.

Box and whisker plot

D.

Hexbin plot

Question 11

In K-means clustering, what is a graph of the WSS versus the value of K used to help determine?

Options:

A.

Optimal distance between clusters

B.

Average distance between observations

C.

'Optimal number of clusters

D.

Average distance between clusters

Question 12

What action occurs during feature selection in the model building phase of the data analytics lifecycle?

Options:

A.

Create new combinations of attributes

B.

Overfit the model to improve prediction accuracy

C.

Identify the most useful input variables

D.

Select a superset of variables to shorten training times

Question 13

What is a business driver for Big Data analytics adoption?

Options:

A.

Implement the latest technology and tools

B.

Maintain existing data silos

C.

Identify new business opportunities

D.

Ensure the analysts work in isolation

Question 14

You build a decision tree to classify five different types of customers based on their browsing history from a sample of 500. The resulting decision tree has 17 layers. One of the leaf nodes has only three customers.

What do you conclude?

Options:

A.

The decision tree needs to be rebuilt without the three customers

B.

The decision tree needs to be rebuilt to see if the results change

C.

The sample size is too small, so the classes may not be accurate

D.

Due to large number of layers, there may be an overfitting problem

Question 15

What is the purpose of applying the naïve Bayes conditional independence assumption?

Options:

A.

To simplify the probability calculations

B.

To calculate the probability of rare events

C.

To minimize rounding errors in probability calculations

D.

To accurately calculate each probability

Question 16

In a user-defined aggregate function, what is FFUNC?

Options:

A.

Optional final calculation function

B.

Window function

C.

State transition function

D.

Segment-level calculation function

Question 17

Which SQL set operator returns rows that exist in the first SELECT statement answer set but not in the second SELECT statement?

Options:

A.

EXCEPT

B.

UNION

C.

UNION ALL

D.

INTERSECT

Demo: 17 questions
Total 59 questions