Topics
- A Brief History of Hadoop
- Core Hadoop Components
- Fundamental Concepts
- General Planning Considerations
- Choosing Hardware
- Network Considerations
- Configuring Nodes
- Planning for Cluster Management
- HDFS Features
- Writing and Reading Files
- NameNode Considerations
- HDFS Security
- Namenode Web UI
- Hadoop File Shell
- Pulling data from External Sources with Flume
- Importing Data from Relational Databases with Sqoop
- REST Interfaces
- Best Practices
- MapReduce Overview
- Features of MapReduce
- Architectural Overview
- YARN MapReduce Version 2
- Failure Recovery
- The JobTracker Web UI
- Configuration and Deployment Types
- Installing Hadoop
- Specifying the Hadoop Configuration
- Initial HDFS and MapReduce Configuration
- Log Files
- Hive
- Impala
- Pig
- What is a Hadoop Client?
- Installing and Configuring Hadoop Clients
- Installing and Configuring Hue
- Hue Authentication and Configuration
- Advanced Configuration Parameters
- Configuring Hadoop Ports
- Explicitly Including and Excluding Hosts
- Configuring HDFS for Rack Awareness and HDFS High Availability
- Why Hadoop Security is Important
- Hadoop’s Security System Concepts
- What Kerberos is and How it Works
- Securing a Hadoop Cluster with Kerberos
- Managing Running Jobs
- Scheduling Hadoop Jobs
- Configuring the FairScheduler
- Checking HDFS Status
- Copying Data Between Clusters
- Adding/Removing Cluster Nodes
- Rebalancing the Cluster
- NameNode Metadata Backup
- Cluster Upgrades
- General System Monitoring
- Managing Hadoop’s Log Files
- Monitoring the Clusters
- Common Troubleshooting Issues