Cloudera Data Analyst Training using Pig Hive and Impala

Cloudera Data Analyst Training using Pig Hive and Impala


Cloudera University’s four-day data analyst training course focusing on Apache Pig and Hive and Cloudera Impala will teach you to apply traditional data analytics and business intelligence skills to big data. Cloudera presents the tools data professionals need to access, manipulate, transform, and analyze complex data sets using SQL and familiar scripting languages.


4 Days



Hadoop Fundamentals 

The Motivation for Hadoop 

Hadoop Overview 

Data Storage: HDFS 

Distributed Data Processing: YARN, MapReduce, and Spark 

Data Processing and Analysis: Pig, Hive, and Impala 

Data Integration: Sqoop 

Other Hadoop Data Tools 

Exercise Scenarios Explanation 


Introduction to Pig 

What Is Pig? 

Pig’s Features 

Pig Use Cases 

Interacting with Pig 


Basic Data Analysis with Pig 

Pig Latin Syntax 

Loading Data 

Simple Data Types 

Field Definitions 

Data Output 

Viewing the Schema 

Filtering and Sorting Data 

Commonly-Used Functions 


Processing Complex Data with Pig 

S torage Formats 

Complex/Nested Data Types 

G rouping 

Built-In Functions for Complex Data 

Iterating Grouped Data 


Multi-Dataset Operations with Pig 

Techniques for Combining Data Sets 

Joining Data Sets in Pig 

Set Operations 

Splitting Data Sets 


Pig Troubleshooting and Optimization 

Troubleshooting Pig 


Using Hadoop’s Web UI 

Data Sampling and Debugging 

Performance Overview 

Understanding the Execution Plan 

Tips for Improving the Performance of Your Pig Jobs 


Introduction to Hive and Impala 

What Is Hive? 

What Is Impala? 

Schema and Data Storage 

Comparing Hive to Traditional Databases 

Hive Use Cases 


Querying with Hive and Impala 

Databases and Tables 

Basic Hive and Impala Query Language Syntax 

Data Types 

Differences Between Hive and Impala Query Syntax 

Using Hue to Execute Queries 

Using the Impala Shell 


Data Management 

Data Storage 

Creating Databases and Tables 

Loading Data 

Altering Databases and Tables 

Simplifying Queries with Views 

Storing Query Results 


Data Storage and Performance 

Partitioning Tables 

Choosing a File Format 

Managing Metadata 

Controlling Access to Data 


Relational Data Analysis with Hive and Impala 

Joining Datasets 

Common Built-In Functions 

Aggregation and Windowing 


Working with Impala 

How Impala Executes Queries 

Extending Impala with User-Defined Functions 

Improving Impala Performance 


Analyzing Text and Complex Data with Hive 

Complex Values in Hive 

Using Regular Expressions in Hive 

Sentiment Analysis and N-Grams 



Hive Optimization 

Understanding Query Performance 

Controlling Job Execution Plan 


Indexing Data 


Extending Hive 


Data Transformation with Custom Scripts 

User-Defined Functions 

Parameterized Queries 


Choosing the Best Tool for the Job 

Comparing MapReduce, Pig, Hive, Impala, and Relational Databases 

Which to Choose? 



Upcoming Classes

No classes have been scheduled, but you can always Request a Quote.