‹ Back To Training

Hadoop Data Analyst

Timeline: 3 Days

Topics

Expand All › ‹ Collapse All

  • Hadoop Overview
  • The Hadoop Ecosystem
  • The Hadoop Distributed File System (HDFS)
  • Inputting Data into HDFS
  • The MapReduce Framework and YARN
  • Overview of Sqoop/Flume
  • Overview of Ozzie Workflow Engine
  • Pig’s Features/Use Cases
  • Interacting with Pig
  • Pig Latin
  • Loading Data
  • Field Definitions and Simple Data Types
  • Data Output
  • Viewing the Schema
  • Filtering /Sorting Data
  • Common Functions
  • Storage Formats
  • Complex/Nested Data Types
  • Grouping
  • Built­in Functions for Working with Complex Data
  • Iterating Grouped Data
  • Combining Data Sets
  • Joining Data Sets
  • Set Operations
  • Splitting Data Sets
  • Parameters
  • Macros / Imports
  • UDFs
  • Using Other Languages to Process Data with Pig
  • Logging
  • Hadoop’s Web UI
  • Data Sampling and Debugging
  • Understanding the Execution Plan
  • Improving the Performance
  • Hive Schema and Data Storage
  • Hive vs. Traditional Databases
  • Hive vs. Pig
  • When to Use Hive
  • Relational Data Analysis with Hive
  • Hive Databases and Tables
  • Basic HiveQL Syntax
  • Data Types
  • Joining Data Sets
  • Common Built­in Functions
  • Hive Data Formats
  • Creating Databases and Hive­managed Tables
  • Loading Data into Hive
  • Altering Databases and Tables
  • Self­managed Tables
  • Simplifying Queries with Views
  • Storing Query Results
  • Controlling Access to Data
  • Text Processing
  • Important String Functions
  • Using Regular Expressions in Hive
  • Understanding Query Performance
  • Controlling Job Execution Plan
  • Partitioning
  • Bucketing
  • Indexing Data
  • Data Transformation with Custom Scripts
  • User­defined Functions
  • Parameterized Queries