Big Data, Master’s Program

Big Data Architect Masters Program

Instructor

0

0 reviews

Description
Curriculum
Reviews

Big-Data-Architect-Masters-Program

About the Program

This program follows a set structure with 6 core courses and 3 electives spread across 29 weeks. It makes you an expert in key technologies related to Big Data ecosystem. At the end of each core course, you will be working on a real-time project to gain hands on expertise. By the end of the program you will be ready for seasoned Big Data job roles.

1

Java Essentials

Text lesson

Introduction to Java
Goal:
In this module, you will learn about Java architecture, advantages of Java, develop the code with various data types, conditions and loops.
Objectives:
At the end of this module, you will be able to
• Understand the advantages of Java
• Understand where Java is used
• Understand how memory management is handled in Java
• Create a Java project in Eclipse and execute it
• Implement if..else construct in Java
• Develop codes using various data types in Java
• Implement various loops
Topics:
• Introduction to Java
• Bytecode
• Class Files
• Compilation Process
• Data types and Operations
• If conditions
• Loops - for, while and do while
Hands On/Demo:
• Data Types and Operations
• if Condition
• for..loop
• while..loop
• do..while loop
Data Handling and Functions
Goal:
In this module, you will learn how to code with arrays, functions and strings using examples and Programs.
Objectives:
At the end of this module, you will be able to
• Implement Single and Multi-dimensional array
• Declare and Define Functions
• Call Functions by value and by reference
• Implement Method Overloading
• Use String data-type and String-buffer
Topics:
• Arrays - Single Dimensional and Multidimensional arrays
• Functions
• Function with Arguments
• Function Overloading
• Concept of Static Polymorphism
• String Handling -String
• String buffer Classes
Hands On/Demo:
• Declaring the arrays
• Accepting data for the arrays
• Calling the functions which takes arguments, perform search in the array and display the record by calling the function which takes arguments
Object Oriented Programming in Java
Goal:
In this module, you will learn object oriented programming through Java using Classes, Objects and various Java concepts like Abstract, Final etc.
Objectives:
At the end of this module, you will be able to
• Implement classes and objects in Java
• Create class constructors
• Overload constructors
• Inherit classes and create sub-classes
• Implement abstract classes and methods
• Use static keyword
Topics:
• OOPS in Java:
o Concept of Object Orientation
o Attributes and Methods
o Classes and Objects
• Methods and Constructors
o Default Constructors
o Constructors with Arguments
o Inheritance
o Abstract
o Final and Static
Hands On/Demo:
• Inheritance
• Overloading
• Overriding
Packages and Multi-Threading
Goal:
In this module, you will learn about packages in Java and scope specifiers of Java. You will also learn exception handling and how multi-threading works in Java.
Objectives:
At the end of this module, you will be able to
• Implement interface and use it
• Extend interface with other interface
• Create package and name it Import packages while creating a new class
• Understand various exceptions
• Handle exception using try catch block
• Handle exception using throw and throws keyword
• Implement threads using thread class and runnable interface
• Understand and implement multithreading
Topics:
• Packages and Interfaces
• Access Specifiers
• Package
• Exception Handling
• Multi-Threading
Hands On/Demo:
• Interfaces
• Packages
• Exception
• Thread
Collections
Goal:
In this module, you will learn how to write code with Wrapper Classes, Inner Classes and Applet Programs. How to use io, lang and util packages of Java and Collections.
Objectives:
At the end of this module, you will be able to
• Identifiy, and use important Inbult Java Packages like java.lang, java.io, jara.util etc.
• Use Wrapper classes
• Understand collections framework
• Implement logic using ArrayList and Vector and Queue
• Use set, HashSet and TreeSet
• Implement logic using Nap HashMap and Hashtable
Topics:
• Wrapper Classes and Inner Classes: Integer, Character, Boolean, Float etc.
• Applet Programs: How to write UI programs with Applet, Java.lang, Java.io, Java.util.
• Collections: ArrayList, Vector, HashSet, TreeSet, HashMap, HashTable.
Hands On/Demo:
• Wrapper class
• Collection
2
Big Data Hadoop Certification Training

Text lesson
Understanding Big Data and Hadoop
Learning Objectives: In this module, you will understand what Big Data is, the limitations of the traditional solutions for Big Data problems, how Hadoop solves those Big Data problems, Hadoop Ecosystem, Hadoop Architecture, HDFS, Anatomy of File Read and Write & how MapReduce works.
Topics:
Introduction to Big Data & Big Data Challenges
Limitations & Solutions of Big Data Architecture
Hadoop & its Features
Hadoop Ecosystem
Hadoop 2.x Core Components
Hadoop Storage: HDFS (Hadoop Distributed File System)
Hadoop Processing: MapReduce Framework
Different Hadoop Distributions
Hadoop Architecture and HDFS
Learning Objectives: In this module, you will learn Hadoop Cluster Architecture, important configuration files of Hadoop Cluster, Data Loading Techniques using Sqoop & Flume, and how to setup Single Node and Multi-Node Hadoop Cluster.
Topics:
Hadoop 2.x Cluster Architecture
Federation and High Availability Architecture
Typical Production Hadoop Cluster
Hadoop Cluster Modes
Common Hadoop Shell Commands
Hadoop 2.x Configuration Files
Single Node Cluster & Multi-Node Cluster set up
Basic Hadoop Administration
Hadoop MapReduce Framework
Learning Objectives: In this module, you will understand Hadoop MapReduce framework comprehensively, the working of MapReduce on data stored in HDFS. You will also learn the advanced MapReduce concepts like Input Splits, Combiner & Partitioner.
Topics:
Traditional way vs MapReduce way
Why MapReduce
YARN Components
YARN Architecture
YARN MapReduce Application Execution Flow
YARN Workflow
Anatomy of MapReduce Program
Input Splits, Relation between Input Splits and HDFS Blocks
MapReduce: Combiner & Partitioner
Demo of Health Care Dataset
Demo of Weather Dataset
Advanced Hadoop MapReduce
Learning Objectives: In this module, you will learn Advanced MapReduce concepts such as Counters, Distributed Cache, MRunit, Reduce Join, Custom Input Format, Sequence Input Format and XML parsing.
Topics:
Counters
Distributed Cache
MRunit
Reduce Join
Custom Input Format
Sequence Input Format
XML file Parsing using MapReduce
Apache Pig
Learning Objectives: In this module, you will learn Apache Pig, types of use cases where we can use Pig, tight coupling between Pig and MapReduce, and Pig Latin scripting, Pig running modes, Pig UDF, Pig Streaming & Testing Pig Scripts. You will also be working on healthcare dataset.
Topics:
Introduction to Apache Pig
MapReduce vs Pig
Pig Components & Pig Execution
Pig Data Types & Data Models in Pig
Pig Latin Programs
Shell and Utility Commands
Pig UDF & Pig Streaming
Testing Pig scripts with Punit
Aviation use-case in PIG
Pig Demo of Healthcare Dataset
Apache Hive
Learning Objectives: This module will help you in understanding Hive concepts, Hive Data types, loading and querying data in Hive, running hive scripts and Hive UDF.
Topics:
Introduction to Apache Hive
Hive vs Pig
Hive Architecture and Components
Hive Metastore
Limitations of Hive
Comparison with Traditional Database
Hive Data Types and Data Models
Hive Partition
Hive Bucketing
Hive Tables (Managed Tables and External Tables)
Importing Data
Querying Data & Managing Outputs
Hive Script & Hive UDF
Retail use case in Hive
Hive Demo on Healthcare Dataset
Advanced Apache Hive and HBase
Learning Objectives: In this module, you will understand advanced Apache Hive concepts such as UDF, Dynamic Partitioning, Hive indexes and views, and optimizations in Hive. You will also acquire in-depth knowledge of Apache HBase, HBase Architecture, HBase running modes and its components.
Topics:
Hive QL: Joining Tables, Dynamic Partitioning
Custom MapReduce Scripts
Hive Indexes and views
Hive Query Optimizers
Hive Thrift Server
Hive UDF
Apache HBase: Introduction to NoSQL Databases and HBase
HBase v/s RDBMS
HBase Components
HBase Architecture
HBase Run Modes
HBase Configuration
HBase Cluster Deployment
Advanced Apache HBase
Learning Objectives: This module will cover advance Apache HBase concepts. We will see demos on HBase Bulk Loading & HBase Filters. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster & why HBase uses Zookeeper.
Topics:
HBase Data Model
HBase Shell
HBase Client API
Hive Data Loading Techniques
Apache Zookeeper Introduction
ZooKeeper Data Model
Zookeeper Service
HBase Bulk Loading
Getting and Inserting Data
HBase Filters
Processing Distributed Data with Apache Spark
Learning Objectives: In this module, you will learn what is Apache Spark, SparkContext & Spark Ecosystem. You will learn how to work in Resilient Distributed Datasets (RDD) in Apache Spark. You will be running application on Spark Cluster & comparing the performance of MapReduce and Spark.
Topics:
What is Spark
Spark Ecosystem
Spark Components
What is Scala
Why Scala
SparkContext
Spark RDD
Oozie and Hadoop Project
Learning Objectives: In this module, you will understand how multiple Hadoop ecosystem components work together to solve Big Data problems. This module will also cover Flume & Sqoop demo, Apache Oozie Workflow Scheduler for Hadoop Jobs, and Hadoop Talend integration.
Topics:
Oozie
Oozie Components
Oozie Workflow
Scheduling Jobs with Oozie Scheduler
Demo of Oozie Workflow
Oozie Coordinator
Oozie Commands
Oozie Web Console
Oozie for MapReduce
Combining flow of MapReduce Jobs
Hive in Oozie
Hadoop Project Demo
Hadoop Talend Integration
Certification Project
1) Analyses of a Online Book Store

A. Find out the frequency of books published each year. (Hint: Sample dataset will be provided)
B. Find out in which year maximum number of books were published
C. Find out how many books were published based on ranking in the year 2002.

Sample Dataset Description
The Book-Crossing dataset consists of 3 tables that will be provided to you.

2) Airlines Analysis

A. Find list of Airports operating in the Country India
B. Find the list of Airlines having zero stops
C. List of Airlines operating with code share
D. Which country (or) territory having highest Airports
E. Find the list of Active Airlines in United state

Sample Dataset Description
In this use case, there are 3 data sets. Final_airlines, routes.dat, airports_mod.dat
3
Apache Spark and Scala Certification Training

Text lesson
Introduction to Big Data Hadoop and Spark
Learning Objectives:
Understand Big Data and its components such as HDFS. You will learn about the Hadoop Cluster Architecture and you will also get an introduction to Spark and you will get to know about the difference between batch processing and real-time processing.
Topics:
What is Big Data?
Big Data Customer Scenarios
Limitations and Solutions of Existing Data Analytics Architecture with Uber Use Case
How Hadoop Solves the Big Data Problem?
What is Hadoop?
Hadoop’s Key Characteristics
Hadoop Ecosystem and HDFS
Hadoop Core Components
Rack Awareness and Block Replication
YARN and its Advantage
Hadoop Cluster and its Architecture
Hadoop: Different Cluster Modes
Big Data Analytics with Batch & Real-time Processing
Why Spark is needed?
What is Spark?
How Spark differs from other frameworks?
Spark at Yahoo!
Introduction to Scala for Apache Spark
Learning Objectives:
Learn the basics of Scala that are required for programming Spark applications. You will also learn about the basic constructs of Scala such as variable types, control structures, collections such as Array, ArrayBuffer, Map, Lists, and many more.
Topics:
What is Scala?
Why Scala for Spark?
Scala in other Frameworks
Introduction to Scala REPL
Basic Scala Operations
Variable Types in Scala
Control Structures in Scala
Foreach loop, Functions and Procedures
Collections in Scala- Array
ArrayBuffer, Map, Tuples, Lists, and more
Hands-on:
Scala REPL Detailed Demo
Functional Programming and OOPs Concepts in Scala
Learning Objectives:
In this module, you will learn about object-oriented programming and functional programming techniques in Scala.
Topics:
Functional Programming
Higher Order Functions
Anonymous Functions
Class in Scala
Getters and Setters
Custom Getters and Setters
Properties with only Getters
Auxiliary Constructor and Primary Constructor
Singletons
Extending a Class
Overriding Methods
Traits as Interfaces and Layered Traits
Hands-on:
OOPs Concepts
Functional Programming
Deep Dive into Apache Spark Framework
Learning Objectives:
Understand Apache Spark and learn how to develop Spark applications. At the end, you will learn how to perform data ingestion using Sqoop.
Topics:
Spark’s Place in Hadoop Ecosystem
Spark Components & its Architecture
Spark Deployment Modes
Introduction to Spark Shell
Writing your first Spark Job Using SBT
Submitting Spark Job
Spark Web UI
Data Ingestion using Sqoop
Hands-on:
Building and Running Spark Application
Spark Application Web UI
Configuring Spark Properties
Data ingestion using Sqoop
Playing with Spark RDDs
Learning Objectives:
Get an insight of Spark - RDDs and other RDD related manipulations for implementing business logics (Transformations, Actions and Functions performed on RDD).
Topics:
Challenges in Existing Computing Methods
Probable Solution & How RDD Solves the Problem
What is RDD, It’s Operations, Transformations & Actions
Data Loading and Saving Through RDDs
Key-Value Pair RDDs
Other Pair RDDs, Two Pair RDDs
RDD Lineage
RDD Persistence
WordCount Program Using RDD Concepts
RDD Partitioning & How It Helps Achieve Parallelization
Passing Functions to Spark
Hands-on:
Loading data in RDDs
Saving data through RDDs
RDD Transformations
RDD Actions and Functions
RDD Partitions
WordCount through RDDs
DataFrames and Spark SQL
Learning Objectives:
In this module, you will learn about SparkSQL which is used to process structured data with SQL queries, data-frames and datasets in Spark SQL along with different kind of SQL operations performed on the data-frames. You will also learn about the Spark and Hive integration.
Topics:
Need for Spark SQL
What is Spark SQL?
Spark SQL Architecture
SQL Context in Spark SQL
User Defined Functions
Data Frames & Datasets
Interoperating with RDDs
JSON and Parquet File Formats
Loading Data through Different Sources
Spark – Hive Integration
Hands-on:
Spark SQL – Creating Data Frames
Loading and Transforming Data through Different Sources
Stock Market Analysis
Spark-Hive Integration
Machine Learning using Spark MLlib
Learning Objectives:
Learn why machine learning is needed, different Machine Learning techniques/algorithms, and SparK MLlib.
Topics:
Why Machine Learning?
What is Machine Learning?
Where Machine Learning is Used?
Face Detection: USE CASE
Different Types of Machine Learning Techniques
Introduction to MLlib
Features of MLlib and MLlib Tools
Various ML algorithms supported by MLlib
Deep Dive into Spark MLlib
Learning Objectives:
Implement various algorithms supported by MLlib such as Linear Regression, Decision Tree, Random Forest and many more.
Topics:
Supervised Learning - Linear Regression, Logistic Regression, Decision Tree, Random Forest
Unsupervised Learning - K-Means Clustering & How It Works with MLlib
Analysis on US Election Data using MLlib (K-Means)
Hands-on:
Machine Learning MLlib
K- Means Clustering
Linear Regression
Logistic Regression
Decision Tree
Random Forest
Understanding Apache Kafka and Apache Flume
Learning Objectives:
Understand Kafka and its Architecture. Also, learn about Kafka Cluster, how to configure different types of Kafka Cluster. Get introduced to Apache Flume, its architecture and how it is integrated with Apache Kafka for event processing. At the end, learn how to ingest streaming data using flume.
Topics:
Need for Kafka
What is Kafka?
Core Concepts of Kafka
Kafka Architecture
Where is Kafka Used?
Understanding the Components of Kafka Cluster
Configuring Kafka Cluster
Kafka Producer and Consumer Java API
Need of Apache Flume
What is Apache Flume?
Basic Flume Architecture
Flume Sources
Flume Sinks
Flume Channels
Flume Configuration
Integrating Apache Flume and Apache Kafka
Hands-on:
Configuring Single Node Single Broker Cluster
Configuring Single Node Multi Broker Cluster
Producing and consuming messages
Flume Commands
Setting up Flume Agent
Streaming Twitter Data into HDFS
Apache Spark Streaming - Processing Multiple Batches
Learning Objectives:
Work on Spark streaming which is used to build scalable fault-tolerant streaming applications. Also, learn about DStreams and various Transformations performed on the streaming data. You will get to know about commonly used streaming operators such as Sliding Window Operators and Stateful Operators.
Topics:
Drawbacks in Existing Computing Methods
Why Streaming is Necessary?
What is Spark Streaming?
Spark Streaming Features
Spark Streaming Workflow
How Uber Uses Streaming Data
Streaming Context & DStreams
Transformations on DStreams
Describe Windowed Operators and Why it is Useful
Important Windowed Operators
Slice, Window and ReduceByWindow Operators
Stateful Operators
Apache Spark Streaming - Data Sources
Learning Objectives:
In this module, you will learn about the different streaming data sources such as Kafka and flume. At the end of the module, you will be able to create a spark streaming application.
Topics:
Apache Spark Streaming: Data Sources
Streaming Data Source Overview
Apache Flume and Apache Kafka Data Sources
Example: Using a Kafka Direct Data Source
Perform Twitter Sentimental Analysis Using Spark Streaming
Hands-on:
Different Streaming Data Sources
In-class Project
Learning Objectives:
Work on an end-to-end Financial domain project covering all the major concepts of Spark taught during the course.
Spark GraphX (Self-Paced)
Learning Objectives:
In this module, you will be learning the key concepts of Spark GraphX programming and operations along with different GraphX algorithms and their implementations.
4
Apache Cassandra Certification Training

Text lesson
Introduction to Big Data, and Cassandra
Goal: In this module you will get a brief introduction of Big Data and how it creates problems for traditional Database Management Systems like RDBMS. You will also learn how Cassandra solves these problems and understand Cassandra’s features.
Skills:
Basic concepts of Cassandra
Objectives:
At the end of this module, you will be able to
Explain what is Big Data
List the Limitations of RDBMS
Define NoSQL and it’s Characteristics
Define CAP Theorem
Learn Cassandra
List the Features of Cassandra
Get a Tour of Edureka’s VM
Topics:
Introduction to Big Data and Problems caused by it
5V – Volume, Variety, Velocity, Veracity and Value
Traditional Database Management System
Limitations of RDMS
NOSQL databases
Common characteristics of NoSQL databases
CAP theorem
How Cassandra solves the Limitations?
History of Cassandra
Features of Cassandra
Hands On:
Edureka VM tour
Cassandra Data Model
Goal: In this module, you will learn about Database Model and similarities between RDBMS and Cassandra Data Model. You will also understand the key Database Elements of Cassandra and learn about the concept of Primary Key.
Skills:
Data Modelling in Cassandra
Data Structure Design
Objectives:
At the end of this module, you will be able to
Explain what is Database Modelling and it’s Features
Describe the Different Types of Data Models
List the Difference between RDBMS and Cassandra Data Model
Define Cassandra Data Model
Explain Cassandra Database Elements
Implement Keyspace Creation, Updating and Deletion
Implement Table Creation, Updating and Deletion
Topics:
Introduction to Database Model
Understand the analogy between RDBMS and Cassandra Data Model
Understand following Database Elements: Cluster, Keyspace, Column Family/Table, Column
Column Family Options
Columns
Wide Rows, Skinny Rows
Static and dynamic tables
Hands-On:
Creating Keyspace
Creating Tables
Cassandra Architecture
Goal: Gain knowledge of architecting and creating Cassandra Database Systems. In addition, learn about the complex inner workings of Cassandra such as Gossip Protocol, Read Repairs and so on.
Skills:
• Cassandra Architecture
Objectives: At the end of this module, you will be able to:
• Explain the Architecture of Cassandra
• Describe the Different Layers of Cassandra Architecture
• Learn about Gossip Protocol
• Describe Partitioning and Snitches
• Explain Vnodes and How Read and Write Path works
• Understand Compaction, Anti-Entropy and Tombstone
• Describe Repairs in Cassandra
• Explain Hinted Handoff
Topics:
• Cassandra as a Distributed Database
• Key Cassandra Elements
a. Memtable
b. Commit log
c. SSTables
• Replication Factor
• Data Replication in Cassandra
• Gossip protocol – Detecting failures
• Gossip: Uses
• Snitch: Uses
• Data Distribution
• Staged Event-Driven Architecture (SEDA)
• Managers and Services
• Virtual Nodes: Write path and Read path
• Consistency level
• Repair
• Incremental repair
Deep Dive into Cassandra Database
Goal: In this module you will learn about Keyspace and its attributes in Cassandra. You will also create Keyspace, learn how to create a Table and perform operations like Inserting, Updating and Deleting data from a table while using CQLSH.
Skills:
• Database Operations
• Table Operations
Objectives: At the end of this module, you will be able to:
• Describe Different Data Types Used in Cassandra
• Explain Collection Types
• Describe What are CRUD Operations
• Implement Insert, Select, Update and Delete of various elements
• Implement Various Functions Used in Cassandra
• Describe Importance of Roles and Indexing
• Understand tombstones in Cassandra
Topics:
• Replication Factor
• Replication Strategy
• Defining columns and data types
• Defining a partition key
• Recognizing a partition key
• Specifying a descending clustering order
• Updating data
• Tombstones
• Deleting data
• Using TTL
• Updating a TTL
Hands-on/Demo
• Create Keyspace in Cassandra
• Check Created Keyspace in System_Schema.Keyspaces
• Update Replication Factor of Previously Created Keyspace
• Drop Previously Created Keyspace
• Create A Table Using cqlsh
• Create A Table Using UUID & TIMEUUID
• Create A Table Using Collection & UDT Column
• Create Secondary Index On a Table
• Insert Data Into Table
• Insert Data into Table with UUID & TIMEUUID Columns
• Insert Data Using COPY Command
• Deleting Data from Table
Node Operations in a Cluster
Goal: Learn how to add nodes in Cassandra and configure Nodes using “cassandra.yaml” file. Use Nodetool to remove node and restore node back into the service. In addition, by using Nodetool repair command learn the importance of repair and how repair operation functions.
Skills:
• Node Operations
Objectives: At the end of this module, you will be able to:
• Explain Cassandra Nodes
• Understand Seed Nodes
• Configure Seed Nodes using cassandra.yaml file
• Add/bootstrap a node in a Cluster
• Use Nodetool utility to decommission a node from the cluster
• Remove a Dead Node from a Cluster
• Describe the need to repair Nodes
• Use Nodetool repair command
Topics:
• Cassandra nodes
• Specifying seed nodes
• Bootstrapping a node
• Adding a node (Commissioning) in Cluster
• Removing (Decommissioning) a node
• Removing a dead node
• Repair
• Read Repair
• What’s new in incremental repair
• Run a Repair Operation
• Cassandra and Spark Implementation
Hands On:
• Commissioning a Node
• Decommissioning a Node
• Nodetool Commands
Managing and Monitoring the Cluster
Goal: The key aspects to monitoring Cassandra are resources used by each node, response latencies to requests, requests to offline nodes, and the compaction process. Learn to use various monitoring tools in Cassandra such as Nodetool and JConsole in this module.
Skills:
• Clustering
Objectives: At the end of this module, you will be able to:
• Describe the various monitoring tools available
• Implement nodetool utility to manage a cluster
• Use JConsole to monitor JMX statistics
• Understand OpsCenter tool
Topics:
• Cassandra monitoring tools
• Logging
• Tailing
• Using Nodetool Utility
• Using JConsole
• Learning about OpsCenter
• Runtime Analysis Tools
Hands On:
• JMX and Jconsole
• OpsCenter
Backup & Restore and Performance Tuning
Goal: In this Module you will learn about the importance of Backup and Restore functions in Cassandra and Create Snapshots in Cassandra. You will learn about Hardware selection and Performance Tuning (Configuring Log Files) in Cassandra. You will also learn about Cassandra integration with various other frameworks.
Skills:
• Performance tuning
• Cassandra Design Principals
• Backup and Restoration
Objectives: At the end of this module, you’ll be able to:
• Learn backup and restore functionality and its importance
• Create a snapshot using Nodetool utility
• Restore a snapshot
• Understand how to choose the right balance of the following resources: memory, CPU, disks, number of nodes, and network.
• Understand all the logs created by Cassandra
• Explain the purpose of different log files
• Configure the log files
• Learn about Performance Tuning
• Integration with Spark and Kafka
Topics:
• Creating a Snapshot
• Restoring from a Snapshot
• RAM and CPU recommendations
• Hardware choices
• Selecting storage
• Types of Storage to Avoid
• Cluster connectivity, security and the factors that affect distributed system performance
• End-to-end performance tuning of Cassandra clusters against very large data sets
• Load balance and streams
Hands On:
• Creating Snapshots
• Integration with Kafka
• Integration with Spark
Hosting Cassandra Database on Cloud
Goal: In this Module you will learn about Design, Implementation, and on-going support of Cassandra Operational Data. Finally, you will learn how to Host a Cassandra Database on Cloud.
Skills:
• Security
• Design Implementation
• On-going support of Cassandra Operational Data
Objectives: At the end of this module, you’ll be able to:
• Security
• Learn about DataStax
• Create an End-to-End Project using Cassandra
• Implement a Cassandra Database on Cloud
Topics:
• Security
• Ongoing Support of Cassandra Operational Data
• Hosting a Cassandra Database on Cloud
Hands On:
• Hosting Cassandra Database on Amazon Web Services
5
Talend for Data Integration and Big data

Text lesson
Talend – A Revolution in Big Data
Learning Objectives: In this module of Talend Training, you will get an overview of ETL Technologies and the reason why Talend is referred as the next Generation Leader in Big Data Integration. You will be introduced to various products offered by Talend Corporation till date and its relevance to Data Integration and Big Data. Further, you will learn about the TOS (Talend Open Studio), its Architecture, GUI, and how to install TOS.
Skills:
Core ETL concepts
Talend products and their features
Design and implementation of Talend Open Studio
Topics:
Working with ETL
Rise of Big Data
Role of Open Source ETL Technologies in Big Data
Comparison with other market leader tools in ETL domain
Importance of Talend (Why Talend)
Talend and its Products
Introduction of Talend Open Studio
TOS for Data Integration
GUI of TOS with Demo
Hands-on/Demo:
Creating a basic job
Working with Talend Open Studio for DI
Learning Objectives: In this module of Talend course, you will learn to work with various types of Data Source, Target Systems supported by Talend, Metadata and how to read/write from popular CSV/Delimited file and fixed width file. Connect to a Database and read/write/update data and read complex source system like Excel and XML along with some of the basic components like tLog, tMap using TOS.
Skills:
Create jobs with different components and link them
Read and write files of various format
Work with Database
Topics:
Launching Talend Studio
Working with different workspace directories
Working with projects
Creating and executing jobs
Connection types and triggers
Most frequently used Talend components [tJava, tLogRow, tMap]
Read & Write Various Types of Source/Target Systems
Working with files [CSV, XLS, XML, Positional]
Working with databases [MySQL DB]
Metadata management
Hands-on/Demo/Use-case:
Creating a Business Model
Adding Components to a Job
Connecting the Components
Reading and writing Delimited File
Reading and writing Positional File
Reading and writing XML and Xls/Xlsx Files
Connecting Database(MySQL)
Retrieving Schema from the Database
Reading from Database Metadata
Retrieving data from a file and inserting it into the Database
Deleting data from Database
Working with Logs and Error
Basic Transformations in Talend
Learning Objectives: In this module of Talend Training, you will understand Data Mapping and Transformations using TOS. In addition, you will learn how to filter and join various Data Sources using lookups and search and sort through them.
Skills:
Create and use context variables
Mapping and Transformations
Work with components like tFilter, tJoin, tSortRow, tReplicate, tSplit, Lookup
Topics:
Context Variables
Using Talend components
tJoin
tFilter
tSortRow
tAggregateRow
tReplicate
tSplit
Lookup
tRowGenerator
Accessing job level/ component level information within the job
SubJob (using tRunJob, tPreJob, tPostJob)
Hands-on/Demo/Use-case:
Embedding Context Variables
Adding different environments
Data Mapping using tMap
Using functions in Talend
tJava
tSortRow
tAggregateRow
tReplicate
tFilter
tSplit
tRowGenerator
Perform Lookup operations using tJoin
Creating SubJob (using tRunJob, tPreJob, tPostJob)
Advance Transformations and Executing Jobs remotely in Talend
Learning Objectives: In this module of Talend Certification, you will understand the Transformation and various steps involved in looping job of Talend, ways to search files in a directory and how to process them in a sequence. You will also learn to work with FTP connections, export and import Jobs, run the jobs remotely and parameterize them from the command line.
Skills:
Use various file components like tFileList, tFileCopy, tFileExists, tFileDelete, tFileArchive
Handle logs and errors
Cast data types using tConvert and tMap expression builder
Iterate components using tLoop
Store and retrieve files from FTP
Remotely access Talend
Topics:
Various components of file management (like tFileList, tFileAchive, tFileTouch, tFileDelete)
Error Handling [tWarn, tDie]
Type Casting (convert datatypes among source-target platforms)
Looping components (like tLoop, tForeach)
Using FTP components (like tFTPFileList, tFTPFileExists, tFTPGet, tFTPPut)
Exporting and Importing Talend jobs
How to schedule and run Talend DI jobs externally (using Command line)
Parameterizing a Talend job from command line
Hands-on/Demo/Use-case:
Implementing File Management (like tFileList, tFileAchive, tFileTouch, tFileDelete)
Type Casting (tConvert and tMap(using Expression Builder))
Looping components (like tLoop, tForeach)
Using FTP components (like tFTPFileList, tFTPFileExists, tFTPGet, tFTPPut)
Exporting and Importing Talend Jobs
Parameterizing a Talend Job from command line
Big Data and Hadoop with Talend
Learning Objectives: In this module of Talend Training, you will learn about Big Data and Hadoop concepts, such as HDFS (Hadoop Distributed File System) Architecture, MapReduce, leveraging Big Data through Talend and Talend & Big Data Integration. Learn to set up and use the Talend Open Studio for Big Data. In addition, you will learn to use Big Data connectors in TOS (Talend offers some 800+ connectors for Big Data environment) and access Hadoop Ecosystem from Talend.
Skills:
Understand scope of Talend Open Studio for Big Data
Integrate Hadoop HDFS and Talend
Use Hadoop operations like Map and Aggregate through TOS Big Data
Perform multiple analyses and store results in HDFS
Topics:
Big Data and Hadoop
HDFS and MapReduce
Benefits of using Talend with Big Data
Integration of Talend with Big Data
HDFS commands Vs Talend HDFS utility
Big Data setup using Hortonworks Sandbox in your personal computer
Explaining the TOS for Big Data Environment
Hands-on/Demo/Use-case:
Creating a Project and a Job
Adding Components in a Job
Connecting to HDFS
`Putting` files on HDFS
Using tMap, tAggregate functions
Hive in Talend
Learning Objectives: In this module of Talend Certification Training, you will learn Hive concepts and the setup of Hive environment in Talend. You will learn how to use Hive Big Data connectors in TOS and implement Use Cases using Hive in Talend.
Skills:
Integrate Hive with TOS Big Data
Perform complex Hive queries in Talend
Topics:
Hive and It’s Architecture
Connecting to Hive Shell
Set connection to Hive database using Talend
Create Hive Managed and external tables through Talend
Load and Process Hive data using Talend
Transform data from Hive using Talend
Hands-on/Demo/Use-case:
Process and transform data from Hive
Load data from HDFS & Local File Systems to Hive Table using Hive Shell
Execute the HiveQL query using Talend
Pig and Kafka in Talend
Learning Objectives: In this module of Talend course, you will learn the PIG concepts, the setup of Pig Environment in Talend and Pig Big Data connectors in TOS for Big Data and implement Use Cases using Pig in Talend. Also, you will be given an insight of Apache Kafka, its architecture, and integration with Talend through a real-life use case.
Skills:
Integrate Talend projects with Pig and Kafka
Use Pig for scripting and Kafka for streaming jobs in TOS Big Data
Use TOS Big Data for running Pig and Kafka along with DI, Hadoop HDFS, and Hive
Topics:
Pig Environment in Talend
Pig Data Connectors
Integrate Personalized Pig Code into a Talend job
Apache Kafka
Kafka Components in TOS for Big data
Hands-on/Demo/Use-case:
Use Pig and Kafka connectors in Talend
End to End Project in Talend
Learning Objectives: In this module of Talend Training, you will be developing a Project using Talend DI and Talend BD with MySQL, Hadoop, HDFS, Hive, Pig, and Kafka.
6
Apache Kafka Certification Training

Text lesson
Introduction to Big Data and Apache Kafka
Goal: In this module, you will understand where Kafka fits in the Big Data space, and Kafka Architecture. In addition, you will learn about Kafka Cluster, its Components, and how to Configure a Cluster
Skills:
Kafka Concepts
Kafka Installation
Configuring Kafka Cluster
Objectives: At the end of this module, you should be able to:
Explain what is Big Data
Understand why Big Data Analytics is important
Describe the need of Kafka
Know the role of each Kafka Components
Understand the role of ZooKeeper
Install ZooKeeper and Kafka
Classify different type of Kafka Clusters
Work with Single Node-Single Broker Cluster
Topics:
Introduction to Big Data
Big Data Analytics
Need for Kafka
What is Kafka?
Kafka Features
Kafka Concepts
Kafka Architecture
Kafka Components
ZooKeeper
Where is Kafka Used?
Kafka Installation
Kafka Cluster
Types of Kafka Clusters
Configuring Single Node Single Broker Cluster
Hands on:
Kafka Installation
Implementing Single Node-Single Broker Cluster
Kafka Producer
Goal: Kafka Producers send records to topics. The records are sometimes referred to as Messages. In this Module, you will work with different Kafka Producer APIs.
Skills:
Configure Kafka Producer
Constructing Kafka Producer
Kafka Producer APIs
Handling Partitions
Objectives:
At the end of this module, you should be able to:
Construct a Kafka Producer
Send messages to Kafka
Send messages Synchronously & Asynchronously
Configure Producers
Serialize Using Apache Avro
Create & handle Partitions
Topics:
Configuring Single Node Multi Broker Cluster
Constructing a Kafka Producer
Sending a Message to Kafka
Producing Keyed and Non-Keyed Messages
Sending a Message Synchronously & Asynchronously
Configuring Producers
Serializers
Serializing Using Apache Avro
Partitions
Hands On:
Working with Single Node Multi Broker Cluster
Creating a Kafka Producer
Configuring a Kafka Producer
Sending a Message Synchronously & Asynchronously
Kafka Consumer
Goal: Applications that need to read data from Kafka use a Kafka Consumer to subscribe to Kafka topics and receive messages from these topics. In this module, you will learn to construct Kafka Consumer, process messages from Kafka with Consumer, run Kafka Consumer and subscribe to Topics
Skills:
Configure Kafka Consumer
Kafka Consumer API
Constructing Kafka Consumer
Objectives: At the end of this module, you should be able to:
Perform Operations on Kafka
Define Kafka Consumer and Consumer Groups
Explain how Partition Rebalance occurs
Describe how Partitions are assigned to Kafka Broker
Configure Kafka Consumer
Create a Kafka consumer and subscribe to Topics
Describe & implement different Types of Commit
Deserialize the received messages
Topics:
Consumers and Consumer Groups
Standalone Consumer
Consumer Groups and Partition Rebalance
Creating a Kafka Consumer
Subscribing to Topics
The Poll Loop
Configuring Consumers
Commits and Offsets
Rebalance Listeners
Consuming Records with Specific Offsets
Deserializers
Hands-On:
Creating a Kafka Consumer
Configuring a Kafka Consumer
Working with Offsets
Kafka Internals
Goal: Apache Kafka provides a unified, high-throughput, low-latency platform for handling real-time data feeds. Learn more about tuning Kafka to meet your high-performance needs.
Skills:
Kafka APIs
Kafka Storage
Configure Broker
Objectives:
At the end of this module, you should be able to:
Understand Kafka Internals
Explain how Replication works in Kafka
Differentiate between In-sync and Out-off-sync Replicas
Understand the Partition Allocation
Classify and Describe Requests in Kafka
Configure Broker, Producer, and Consumer for a Reliable System
Validate System Reliabilities
Configure Kafka for Performance Tuning
Topics:
Cluster Membership
The Controller
Replication
Request Processing
Physical Storage
Reliability
Broker Configuration
Using Producers in a Reliable System
Using Consumers in a Reliable System
Validating System Reliability
Performance Tuning in Kafka
Hands On:
Create topic with partition & replication factor 3 and execute it on multi-broker cluster
Show fault tolerance by shutting down 1 Broker and serving its partition from another broker
Kafka Cluster Architectures & Administering Kafka
Goal: Kafka Cluster typically consists of multiple brokers to maintain load balance. ZooKeeper is used for managing and coordinating Kafka broker. Learn about Kafka Multi-Cluster Architectures, Kafka Brokers, Topic, Partitions, Consumer Group, Mirroring, and ZooKeeper Coordination in this module.
Skills:
Administer Kafka
Objectives:
At the end of this module, you should be able to
Understand Use Cases of Cross-Cluster Mirroring
Learn Multi-cluster Architectures
Explain Apache Kafka’s MirrorMaker
Perform Topic Operations
Understand Consumer Groups
Describe Dynamic Configuration Changes
Learn Partition Management
Understand Consuming and Producing
Explain Unsafe Operations
Topics:
Use Cases - Cross-Cluster Mirroring
Multi-Cluster Architectures
Apache Kafka’s MirrorMaker
Other Cross-Cluster Mirroring Solutions
Topic Operations
Consumer Groups
Dynamic Configuration Changes
Partition Management
Consuming and Producing
Unsafe Operations
Hands on:
Topic Operations
Consumer Group Operations
Partition Operations
Consumer and Producer Operations
Kafka Monitoring and Kafka Connect
Goal: Learn about the Kafka Connect API and Kafka Monitoring. Kafka Connect is a scalable tool for reliably streaming data between Apache Kafka and other systems.
Skills:
Kafka Connect
Metrics Concepts
Monitoring Kafka
Objectives: At the end of this module, you should be able to:
Explain the Metrics of Kafka Monitoring
Understand Kafka Connect
Build Data pipelines using Kafka Connect
Understand when to use Kafka Connect vs Producer/Consumer API
Perform File source and sink using Kafka Connect
Topics:
Considerations When Building Data Pipelines
Metric Basics
Kafka Broker Metrics
Client Monitoring
Lag Monitoring
End-to-End Monitoring
Kafka Connect
When to Use Kafka Connect?
Kafka Connect Properties
Hands on:
Kafka Connect
Kafka Stream Processing
Goal: Learn about the Kafka Streams API in this module. Kafka Streams is a client library for building mission-critical real-time applications and microservices, where the input and/or output data is stored in Kafka Clusters.
Skills:
Stream Processing using Kafka
Objectives:
At the end of this module, you should be able to,
Describe What is Stream Processing
Learn Different types of Programming Paradigm
Describe Stream Processing Design Patterns
Explain Kafka Streams & Kafka Streams API
Topics:
Stream Processing
Stream-Processing Concepts
Stream-Processing Design Patterns
Kafka Streams by Example
Kafka Streams: Architecture Overview
Hands on:
Kafka Streams
Word Count Stream Processing
Integration of Kafka With Hadoop, Storm and Spark
Goal: In this module, you will learn about Apache Hadoop, Hadoop Architecture, Apache Storm, Storm Configuration, and Spark Ecosystem. In addition, you will configure Spark Cluster, Integrate Kafka with Hadoop, Storm, and Spark.
Skills:
Kafka Integration with Hadoop
Kafka Integration with Storm
Kafka Integration with Spark
Objectives:
At the end of this module, you will be able to:
Understand What is Hadoop
Explain Hadoop 2.x Core Components
Integrate Kafka with Hadoop
Understand What is Apache Storm
Explain Storm Components
Integrate Kafka with Storm
Understand What is Spark
Describe RDDs
Explain Spark Components
Integrate Kafka with Spark
Topics:
Apache Hadoop Basics
Hadoop Configuration
Kafka Integration with Hadoop
Apache Storm Basics
Configuration of Storm
Integration of Kafka with Storm
Apache Spark Basics
Spark Configuration
Kafka Integration with Spark
Hands On:
Kafka integration with Hadoop
Kafka integration with Storm
Kafka integration with Spark
Integration of Kafka With Talend and Cassandra
Goal: Learn how to integrate Kafka with Flume, Cassandra and Talend.
Skills:
Kafka Integration with Flume
Kafka Integration with Cassandra
Kafka Integration with Talend
Objectives:
At the end of this module, you should be able to,
Understand Flume
Explain Flume Architecture and its Components
Setup a Flume Agent
Integrate Kafka with Flume
Understand Cassandra
Learn Cassandra Database Elements
Create a Keyspace in Cassandra
Integrate Kafka with Cassandra
Understand Talend
Create Talend Jobs
Integrate Kafka with Talend
Topics:
Flume Basics
Integration of Kafka with Flume
Cassandra Basics such as and KeySpace and Table Creation
Integration of Kafka with Cassandra
Talend Basics
Integration of Kafka with Talend
Hands On:
Kafka demo with Flume
Kafka demo with Cassandra
Kafka demo with Talend
Kafka In-Class Project
Goal: In this module, you will work on a project, which will be gathering messages from multiple
sources.
Scenario:
In E-commerce industry, you must have seen how catalog changes frequently. Most deadly problem they face is “How to make their inventory and price
consistent?”.
There are various places where price reflects on Amazon, Flipkart or Snapdeal. If you will visit Search page, Product Description page or any ads on Facebook/google. You will find there are some mismatch in price and availability. If we see user point of view that’s very disappointing because he spends more time to find better products and at last if he doesn’t purchase just because of consistency.
Here you have to build a system which should be consistent in nature. For example, if you are getting product feeds either through flat file or any event
stream you have to make sure you don’t lose any events related to product specially inventory and price.
If we talk about price and availability it should always be consistent because there might be possibility that the product is sold or the seller doesn’t want to sell it anymore or any other reason. However, attributes like Name, description doesn’t make that much noise if not updated on time.
Problem Statement
You have given set of sample products. You have to consume and push products to Cassandra/MySQL once we get products in the consumer. You have to save below-mentioned fields in Cassandra.
1. PogId
2. Supc
3. Brand
4. Description
5. Size
6. Category
7. Sub Category
8. Country
9. Seller Code
In MySQL, you have to store
1. PogId
2. Supc
3. Price
4. Quantity
Certification Project
This Project enables you to gain Hands-On experience on the concepts that you have learned as part of this Course.
You can email the solution to our Support team within 2 weeks from the Course Completion Date. Edureka will evaluate the solution and award a Certificate with a Performance-based Grading.
Problem Statement:
You are working for a website techreview.com that provides reviews for different technologies. The company has decided to include a new feature in the website which will allow users to compare the popularity or trend of multiple technologies based on twitter feeds. They want this comparison to happen in real time. So, as a big data developer of the company, you have been task to implement following things:
• Near Real Time Streaming of the data from Twitter for displaying last minute`s count of people tweeting about a particular technology.
• Store the twitter count data into Cassandra.
7

Big Data Master Program Capstone Project

Text lesson

Project Details
Retail Case Study
The capstone project will provide you with a business case. You will need to solve this by applying all the skills you’ve learned in the courses of the master’s program. This Capstone project will require you to apply the following skills
• Data Modelling in Cassandra
• Using Kafka as real time messaging system
• Stream data from different sources using Spark
• Analysing Data using Spark
• Leveraging NoSQL database such as Cassandra as a part of data storage strategy
• Using MapReduce for analysis of the data
• Data Warehousing and Data exploration using Hive

Please, login to leave a review

Add to wishlist

Get course $1,599

Course details

Duration 200+ Hours of Interactive Learning

Lectures 7

Popular courses

CEH (V11)- Certified Ethical Hacker

$2,250

$1,499

By Mike
CompTIA Security+ 601

$1,499

$1,199

By Mike
Microsoft BI Certification Training

$449

By Mike
Microsoft Power BI Training

$349

By Mike

Working hours

Monday	9:30 am - 6.00 pm
Tuesday	9:30 am - 6.00 pm
Wednesday	9:30 am - 6.00 pm
Thursday	9:30 am - 6.00 pm
Friday	9:30 am - 5.00 pm
Saturday	Closed
Sunday	Closed