Hadoop

Hadoop Training 

Course Description:  

Introduction

  • Data Analytics
  • Introduction to RDBMS
  • What is Big Data?
  • Big Data Challenges
  • What are Technologies support for Big Data
  • Hadoop Introduction

Pre-requisites

  • Core java
  1. Java Virtual Machine
  2. OOP’s Principles
  3. Exceptions
  4. Multi Threads
  5. Map
  • Linux
  1. Basics
  2. Installations
  3. Commands
  • VM Ware
  1. Basics
  2. Installations
  3. Backups
  4. Snapshots
  • SQL
  1. Create Table
  2. Order
  3. Aggregate Functions
  4. Joins

Hadoop

  • What is Hadoop?
  • Hadoop Poweredby and Users
  • Scalability
  • Distributed Framework
  • Hadoop versus RDBMS
  • Brief history of hadoop

Hadoop Daemon Processes

  • Name node
  • Secondary name node
  • Job tracker
  • Task tracker
  • Data node

Hadoop Distributed File System

  • HDFS Design and Architecture
  • HDFS Concepts
  • HDFS High-Availability
  • Interacting HDFS using CLI and Browser
  • Blocks
  • Replication
  • Fault Tolerance
  • Priorities
  • Writing Data into HDFS
  • Reading Data from HDFS

Mapreduce

  • The Parts of a Hadoop MapReduce Job
  • How MapReduce Works
  • MapReduce Types and Formats
  • Input Formats
  • Text Input
  • Multiple Inputs
  • Database Input (and Output)
  • Output Formats
  • Explain Map and Reduce with Example

Hadoop Cluster Configuration

  • Pseudo Distributed mode
  • Cluster mode
  • Ipv6
  • Ssh
  • Installation of java, hadoop
  • Configurations of hadoop
  • Hadoop Processes ( NN, SNN, JT, DN, TT)
  • Temporary directory
  • UI
  • Common errors when running hadoop cluster, solutions

Advanced Mapreduce Concepts

  • Developing Map Reduce Application
  • Phases in Map Reduce Framework
  • Map Reduce Input and Output Formats
  • Advanced Concepts
  • Sample Applications
  • Combiner
  • Map-side join
  • Reduce-Side join
  • Custom Input format class
  • Hash Partitioner
  • Custom Partitioner
  • Sorting techniques
  • Custom Output format class

HIVE

  • Installing Hive
  • The Hive Shell
  • Running Hive
  • Configuring Hive
  • Hive Services
  • The Metastore
  • Comparison with Traditional Databases
  • Schema on Read Versus Schema on Write
  • Running a SQL-style query with Hive
  • Performing a join with Hive
  • Case Study & Example

PIG

  • Installing Pig
  • Running Pig
  • Set operations (join, union)
  • Sorting with Pig
  • Speaking Pig Latin
  • Working with user-defined functions
  • Working with scripts
  • Case Study & Example

Sqoop

  • Introduction
  • Import Data.
  • Export Data.
  • Sqoop Syntax.
  • Databases connection.
  • Hands-on exercise

Impala

  • Introduction to Impala
  • Impala Configuration
  • Comparison between Hive and Impala
  • Impala Commands
  • Example with Usecase

Oozie Workflow

  • Introduction to Oozies
  • Creating workflows
  • Creating job Schedules
  • Example with Usecase

Flume

  • Introduction
  • Configuration and Setup
  • Flume Sink with example
  • Channel
  • Flume Source with example
  • Complex flume architecture

Hue

  • Introduction to Hue
  • Advantages of Hue
  • Hue Web Interface
  • Ecosystems in Hue
  • Example with Usecase

HBase

  • Introduction
  • Configuration
  • Basic Hadoop/ZooKeeper/HBase configurations
  • HBase Versus RDBMS
  • Example with Usecase

Zoo Keeper

  • Introduction to Zoo Keeper
  • Cluster Maintenance
  • Processing watchmen Services
  • Example with Usecase

Real Life Usecases

  • Recommendation Engine
  • Prediction
  • Trend Analysis
  • Data mining
  • Best Practices

Reporting tool:

Tableau

  • Tableau Fundamentals.
  • Tableau Analytics.
  • Visual Analytics.
  • Hands-on exercise

 

Comments are closed