‹ Back To Training

Apache Spark

Timeline: 4 Days

Topics

Expand All › ‹ Collapse All

  • What is Apache Spark?
  • Using the Spark Shell
  • Resilient Distributed Datasets (RDDs)
  • Functional Programming with Spark
  • Why HDFS?
  • HDFS Architecture
  • Using HDFS
  • Spark and the Hadoop Ecosystem
  • Spark and MapReduce
  • RDD Operations
  • KeyValue Pair RDDs
  • MapReduce and Pair RDD Operations
  • Standalone Cluster
  • The Spark Standalone Web UI
  • RDD Partitions and HDFS Data Locality
  • Working With Partitions
  • Executing Parallel Operations
  • Distributed Persistence
  • Caching
  • SparkContext
  • Spark Properties
  • Building and Running a Spark Application
  • Logging
  • Streaming Overview
  • Sliding Window Operations
  • Spark Streaming Applications