Databricks spark architecture question analysis

  • Spark
  • Databricks
  • Certification

posted on 03 Sep 2020

Spark Architecture Questions Analysis

Content Outline

Spark Architecture Basics

As for the basics of the Spark architecture, the following concepts are assessed by this exam:

  • Cluster architecture: nodes, drivers, workers, executors, slots, etc.
  • Spark execution hierarchy: applications, jobs, stages, tasks, etc.
  • Shuffling
  • Partitioning
  • Lazy evaluation
  • Transformations vs. actions
  • Narrow vs. wide transformations

    Spark Architecture Application

    In addition, candidates are asked to apply their knowledge of the following to make optimal decisions when working with Spark. Candidates should be able to interpret how these topics affect a Spark session and how they can use them to improve performance.

  • Execution deployment modes
  • Stability
  • Garbage collection
  • Out-of-memory errors
  • Storage levels
  • Repartitioning
  • Coalescing
  • Broadcasting
  • DataFrames

Definitions

What do they assess?

What something is or what something does:

Example

Which of the following describes a worker node?

  • A. Worker nodes are the nodes of a cluster that perform computations.
  • B. Worker nodes are synonymous with executors.
  • C. Worker nodes always have a one-to-one relationship with executors.
  • D. Worker nodes are the most granular level of execution in the Spark execution hierarchy.
  • E. Worker nodes are the most coarse level of execution in the Spark execution hierarchy.

Relationships

What do they assess?

Your knowledge of what something is or what something does.

Example

Which of the following describes the relationship between worker nodes and executors?

  • A. An executor is a Java Virtual Machine (JVM) running on a worker node.
  • B. A worker node is a Java Virtual Machine (JVM) running on an executor.
  • C. There are always more worker nodes than executors.
  • D. There are always the same number of executors and worker nodes.
  • E. Executors and worker nodes are not related.

Results

What do they assess?

Your ability to predict results (i.e. if “x” occurs, what happens?).

Example

If Spark is running in cluster mode, which of the following statements about nodes is correct?

  • A. There is a single worker node that containes the Spark driver and the executors.
  • B. The Spark driver runs in its own non-worker node without any executors.
  • C. Each executor is running a a JVM inside of a worker node.
  • D. There is always more than one node.
  • E. There might be more executors than total nodes or more total nodes than executors.

Classification

What do they assess?

Your ability to categorize ideas/things.

Example

What of the following DataFrame operations is always classified as a narrow transformation?

  • A. DataFrame.select()
  • B. DataFrame.sort()
  • C. DataFrame.distinct()
  • D. DataFrame.join()
  • E. DataFrame.repartition()

Cluster configurations

See picture below.

cluster-example

See picture above.