Home Courses Offered PROGRAMMING / DATABASE Course Details
 

HADOOP DEVELOPER / ANALYST

Course to maximize your Development & Data Analysis skills on massive data sets in the Hadoop Cluster using SQL and known Scripting languages. 

 

Target Audience: BI Analysts, BI Developers, Data Analysts, Business Analysts, Quality Analysts, Programmers and Beginners.

 

MODULE-1: BIG DATA & HADOOP INTRODUCTION

  • State of Data
  • Big Data Evolution (Volume, Velocity & Variety)
  • The Motivation for Hadoop
  • Hadoop Distribution (Cloudera, Hortonworks, Map R, IBM, MicroSoft, Amazon etc.)
  • Enterprise, Cloud & Local Hadoop

 

MODULE-2: HADOOP ECO-SYSTEM

  • Hadoop Evolution (Gen 1 vs Hadoop Gen 2)
  • Hadoop Technology Stack
  • Hadoop Core (Common) / Projects / Incubator
  • Modern Data Architecture with existing Data Repositories

 

MODULE-3: HADOOP LOCAL INSTALLATION

  • Cloudera Distributed Hadoop (CDH VM)
  • HortonWorks Data Platform (HDP)
  • Apache Hadoop Overview

 

MODULE-4: HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

  • Data Storage: HDFS
  • HDFS Architecture (Name Node, Data Node & Secondary Name Node)
  • HDFS Features & Internals
  • HDFS Interaction & Management
  • HDFS LAB Sessions

 

MODULE-5: MAPREDUCE & YARN

  • Distributed Data Processing: MapReduce, Yet Another Resource Negotiator (YARN)
  • MapReduce Architecture (Job Client, Job Tracker, Task Tracker)
  • MapReduce Internals (Input, Split, Map, Combine, Shuffle, Sort, Reduce, Output)
  • Classic MapReduce (Map Reduce 1) vs YARN (MapReduce 2)
  • YARN Architecture (Resource Manager, Node Manager, Application Master)
  • MapReduce LAB Sessions with Java Programs

 

MODULE-6: HIVE INTRODUCTION

  • Hive Architecture Overview
  • Installing and Running Hive 
  • Schema and Data Storage
  • Hive Principles - Schema on Read & The Hive Warehouse
  • Hive vs. Traditional Relational Databases
  • Hive Access Tools (Shell, Web UI, Thrift Client, JDBC/ODBC Driver)
  • Hive Services
  • Hive Meta Store
  • Use Cases

 

MODULE-7: DEVELOPING WITH HIVE

  • Hive Query Language (HiveQL)
  • Using Command Line Interface (CLI) and Hue UI to Execute Queries
  • Data Types & Type Conversions
  • Data Storage & Managing Metadata
  • Creating / Altering Databases and Tables 
  • Loading Data - Hive Managed and External Tables
  • Simplifying Queries with Views
  • Joining Datasets (Inner, Outer, Semi & Map)
  • Built-In Functions
  • Aggregation, Windowing and Analytics Functions
  • User Defined Functions (Java) & Streaming (Python) from HiveQL
  • SerDe, Performance & Security
  • HIVE LAB Sessions

 

MODULE-8: PIG INTRODUCTION

  • Pig Overview
  • Installing and Running Pig
  • Pig's Features & Use Cases
  • Pig Data Model, Execution Modes and Methods
  • Pig (Procedural) Vs Hive (Declarative)

 

MODULE-9: DEVELOPING WITH PIG LATIN

  • Pig Latin Basics
  • Data Types and Storage Formats
  • Loading and Storing Data
  • Filtering, Sorting, Grouping & Iterating Grouped Data
  • Joining and Splitting Data Sets
  • Set Operations
  • Commonly-Used Built-In Functions
  • Develop User-Defined Functions (Java) and Macros and invoking them from Pig
  • Parameter Substitution Methods
  • Troubleshooting, Debugging and Logging Pig
  • PIG LAB Session

 

MODULE-10: HADOOP DATA INTEGRATION, SCHEDULING & OPERATIONS (SQOOP & OOZIE) INTRODUCTION

  • Data Import / Export between Relational Databases and HDFS / HIVE
  • Workflow Development
  • Use Cases

 

MODULE-11: CHOOSING THE  BEST TOOL FOR THE REQUIREMENT

  • Comparing MapReduce, Pig, Hive, Impala and Relational Databases

 

Classes: 22-25 Hours

Lab Sessions: 25 Hours
 
Duration: 6 Weeks
 
LIVE Session FEE: $450 (Special Discount for Students)
 
Self Paced On-Demand Videos & Material FEE: $150
 
 
***As per the tutor's discretion, some of the provided course content may be altered/omitted to suit the class needs***
 
**Used Images and Logos are Trademarks of the Respective Companies**
 
*Provided Individual Course Fee is not applicable for Corporate Customers & Students*
 
 






Please contact us for the course details including the currently offered courses, course content, price and the schedule.

Free Demo

If you are interested to upgrade your skill-set, please consider attending one of our Real Classes as a free demo to evaluate the class quality and then decide your course of action.