What characterizes the Hadoop Distributed File System?
When building a K-means clustering model, you notice that the clusters did not segment on variables that you expected. What should you do?
What converts SQL-like commands into either Tez, Spark, or MapReduce jobs that are submitted to the Hadoop cluster?
Which SQL OLAP grouping extension returns a result for each output row with 1 identifying a summary row and 0 identifying grouped rows?
What is a benefit of Spark in-memory data processing as opposed to using MapReduce?
Refer to the exhibit, which shows pairwise counts for items purchased together.
Consider the following association rule: Milk -> Eggs
What is value of the lift?
Which component of a final presentation provides a succinct overview of the business situation that was the impetus to initiate the project?
After running a density plot you realize that the data has a long tail to the right. What can you do to make the dataset more normally distributed?
You have been given a task to improve sales force compensation of your organization. As a result of a study, your team decides to classify personnel as follows:
● Did not meet quota
● Met quota
● Exceeded 150% of quota
In which data analytics lifecycle phase should you define these categories for analysis purposes?
You have the data from a popular e-commerce website. You are exploring the time spent (in seconds) on the website by 100,000 customers across 14 different product categories.
What visualization can be used to represent the relationship between time spent and product category?
In K-means clustering, what is a graph of the WSS versus the value of K used to help determine?
What action occurs during feature selection in the model building phase of the data analytics lifecycle?
What is a business driver for Big Data analytics adoption?
You build a decision tree to classify five different types of customers based on their browsing history from a sample of 500. The resulting decision tree has 17 layers. One of the leaf nodes has only three customers.
What do you conclude?
What is the purpose of applying the naïve Bayes conditional independence assumption?
In a user-defined aggregate function, what is FFUNC?
Which SQL set operator returns rows that exist in the first SELECT statement answer set but not in the second SELECT statement?