How to Prepare for Spark Certification

logo-databricks-261x40

As I prepared for the Developer Certification for Apache Spark by Databricks and O’Reilly i noticed that there weren’t that many resources around so I thought I’d collect and share the resources I used to prepare for the exam. Hope it helps!

Test Environment

I do a lot of different development on my laptop and I’m well and truly over having to keep debugging issues with software versions or missing libraries. These days I try and isolate any development environments either as Docker containers or Vagrant images which allows me to run multiple versions locally for development. For Spark I use the Vagrant image from Gustavo here. That includes Spark and Zeppelin. If you’re wanting to run a later version such as 1.6 then you’ll need to edit the install-02-spark.sh file and add the latest Spark and Hadoop versions:

install-04.zeppelin.sh and add the latest Maven build:

and finally the install-99-cleanup.sh:

Follow the steps in the blog post linked above and you’ll have a functioning Spark test environment in a little under an hour. This is great for working through examples quickly but it’s important to remember that the exam is aimed at those having experience running Spark in a production environment, that means you’ll need some time on a cluster to understand how Spark works.

The 5 Main Themes

  1. Understanding the breadth of the Spark API usage across Scala, Java and Python
  2. Applying Best Practices to avoid runtime issues and performance bottlenecks
  3. Distinguishing Spark features and practices from MapReduce usage
  4. Integrating SQL, Streaming, ML and Graph atop the Spark unified engine
  5. Solving typical use cases with Spark in Scala, Java and Python

Which Language?

The exam isn’t language specific and includes code examples in Scala, Java, Python and SQL. Some questions certainly ask you to compare or identify equivalent Spark techniques across languages but the exam is specifically about Spark and doesn’t go into language nuances or require you to write any code, only pick the best answer from several code blocks. So you’ll need to be able to read all languages and I’d suggest going through examples in multiple languages rather than just the one your most familiar with.

Exam Details

You can take the exam either online or at a test center. O’Reilly use Kryterion test centers so you’re bound to find one near you. Note that if you want to do it online there are some prerequisites and if you’re taking the online exam it contains some known issues and bugs. Please read this information for workarounds and support.

You’re given 90 minutes to complete the 40 multiple choice questions, although most people find they can complete the exams in an hour.

Learning Resources

Videos

Example Exercises

  • Spark Summit Training – both an introductory track and an advanced track. Slides, files and video are all available for download from the Spark Summit site. Make sure you’re comfortable with the materials from the advanced track if you’re preparing for the certification.

Books

Sites

Courses

About This Author

Big Data and Python geek. Writes an eclectic mix of news from the world of Big Data and Telecommunications interspersed with articles on Python, Hadoop, E-Commerce and my continual attempts to learn Russian!

1 Comment

You can post comments in this post.


  • Thanks much!! This is an extremely well compiled resource for undestanding Spark beyond the basic introduction that’s available easily.

    ark202 2 years ago Reply


Post A Reply