Course Plan Detailed

ModuleTopicsDemo/Project
1Introduction to Big Data and HadoopThis lession talks about traditional systems, problems associated with traditional large scale systems, what is Hadoop and it’s ecosystemIntroduction to Big Data
Big Data Analytics
What is Big Data
Four Vs of Big Data
Challenges of Traditional System
Distributed Systems
Introduction to Hadoop
Components of Hadoop Ecosystem
2Hadoop Architecture Distributed Storage (HDFS) and YARNThis lesson talks about distributed processing on cluster, HDFS architecture, how to use HDFS, YARN as a resource manager, yarn architecture and how to work with YARN.What is HDFS and its need
Regular File System Vs HDFS
HDFS Architecture and components
High Availability Cluster Implementation
HDFS Component File System and Namespace
Data Block Split
Data Replication and Rack Awareness
HDFS Command Line Demo
Resource Management: YARN
Resource Management: YARN Architecture
Resource Management: Working with YARN
Walk Through of the ClusterDemo
3Distribution Processing using MapReduce This lesson talks about distributed processing framwork, MapReduce and its characteristics and advance MapReduce conceptsDistribution Phases
MapReduce Framework
Word Count ExampleDemo
MapReduce Jobs
Joins in MapReduce
4Data Ingestion and ETLThis lesson talks about Sqoop, basic import and exports in Sqoop, improving sqoop’s performance, limitations of Sqoop and Sqoop2, Apache flume, flume artitecture, flume sources, flume sinks, flume sinks, flume channels, flume configurations, Apache Kafka, its data model, Architecture with Zeppelin integrated.Apache Sqoop and its Use
Import and Export using Sqoop from MySQL to HDFSDemo
Sqoop Connectors
Sqoop Demo
Limitations of Sqoop
Sqoop 2
Apache Flume and its Use
Flume Model and Scalability
Flume Architecture
Configuring Flume Components
Ingest real time Twitter dataDemo
Apache Kafka
Aggregating user activity using Kafka
Kafka Data Model and Partitions
Kafka Architecture
API – Producer side and Consumer side
Setting up Kafka ClusterDemo
Creating Sample Kafka Data Pipeline Using Producer and ConsumerDemo
Practice Project: Data Ingestion Into Big Data Systems and EtlProject
5Apache PigThis lesson talks  Apahe Pig, components of Pig, Pig vs SQL and we will learn how to work with PigIntroduction to Pig
Advantage of Pig over MapReduce
Pig Architecture
Pig Data Model
Pig Modes
Pig Operations and Relations
Analysing Sales Data using PigDemo
Word count problem using PigDemo
Practice Project : Airline DataProject
6Apache HiveThis lesson will introduced to Hive and Impala, why to use Hive and Impala, differences between Hive and Impala, how Hive works and comparison of Hive to traditional databases, and advacned Hive ConceptsIntroduction to Hive
Hive over MapReduce
Hive vs Impala
Hive Architecture
Hive Metastore
Hive DDL and DMLDemo
Hive Operations
Data types and validations
File format types
HCatalog
Data Serialization
Hive Optimization
Hive Partitioning
Hive Bucketing
Hive Sampling
CRUD operations in Hive
Hive Functions
UDF and UDAF
Practice Project : Movie Awards DataProject
7NoSQL database HBaseThis lesson gives introction to HBase, HBase artitecture, data storage in HBase, HBase vs RDBMS.NoSQL Introductions
HBase Overview
HBase Architecture
Data Model
Connecting to HBase
Working with HBaseDemo
8Basics of Functional Programming and ScalaThis lession introduces to Scala and Functional ProgrammingIntroduction to Scala
Scala InstallationDemo
Functional Programming
Programming with Scala
Basic Literals and Arithmetic Operators
Logical Operators
Type Inference, Classes, Objects and Functions
Collections and Types
Operations on Lists
Scala REPL
Practice Project: Companies DataProject
9Apache SparkThis lesson talks about apache spark, how to use spark shell, RDDs, functional programing in SparkIntroduction to Spark and its History
Limitations of MapReduce/Hadoop
In-memory Processing
Hadoop Ecosystem vs Spark
Architecture and Components of Spark
Spark Cluster in Real World
Running Scala Programs in Spark ShellDemo
Setting up Spark IDEDemo
Spark WebUiDemo
10Spark RDDThis lesson talks about RDD in detail and all operation associated with it, key value Pair RDD and few more other pair RDD operations.
You will learn about RDD lineage, overview on caching, distributed persistence, storage levels of RDD persistence, how to choose the correct RDD persistence storage level and RDD fault tolerance, RDD partitions, how to create partitioning on File based RDD, HDFS and data locality, parallel operations on spark, spark and stages and how to control the level of parallelism
Introduction to Spark RDD
Creating Spark RDD
Pair RDD
RDD Operations
Spark Transformations using Scala/PythonDemo
Spark Actions using Scala/Python
Caching and Persistence
Storage Levels
Lineage and DAG
Debugging in Spark
Partitioning in Spark
Scheduling in Spark
Shuffling in Spark
Sort and Shuffle
Aggregating Data with Pair RDD
Different File Formats
Real World Application
Optimizing Spark Jobs
Practice Project: Bus breakdown and delayProject

 

11Spark SQL and DataFramesIn this lesson you will learn about Spark SQL and SQL Context, creating dataframes, transforming and querying datframes and comraing spark SQL with Impala.Also Spark streaming concepts with winodow and join operationsSpark SQL Introduction
Spark SQL Architecture
DataframesDemo
Various data formats
Dataframe Operations
UDF and UDAF
RDD vs Dataframe vs Dataset
Practice Project- Companies DetailProject
12Spark MLLibIn this lession you will learn about spark use cases, interactive algorithms in spark, machine learning and k-means algorithmIntroduction to Spark MLLib
Modelling Big Data with Spark
Analytics in Spark
Machine Learning
Supervised and Unsupervised Learning
Linear Regression Demo
Clustering Demo
K-Means
Reinforcement Learning
Semi-supervised Learning
MLLib Pipelines
Practice Project- Spark Mllib – Diamond PricingProject
13Spark StreamingIn this lecture you will learn about Spark streaming concepts with winodow and join operationsStreaming Overview
Real Time Processing of Big DataDemo
Data Processing Architecture
Spark Streaming
Writing Spark Streaming ApplicationDemo
Introduction to Dstreams
Transformations on Dstreams
Design Patterns
WindowingDemo
Join Operations
Processing Twitter datasetDemo
Structured Spark Streaming
Output Sinks
Structured Streaming APIs
Streaming PipelinesDemo
Practice Project – Spark SteamingProject
14Spark GraphXIn this lecture you will learn abount graph processing and analysisSpark GraphX
Introduction to graphs
Graph Operators
Join Operators
Graph Parallel System
Algorithms in Spark
Pregel API
Graphx Vertex PredicateDemo
Demo Page Rank AlgorithmDemo
Practice Project – Flight timesProject
15Course End Project Car Insurance AnalysisProject
Transactional Data Analysis