Big Data

Big Data, the technology in which our IT industries are investing at very large scale to manage and maintain large set of data. These data sets are used to be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy. Our workshop will introduce all the core concepts of big data computing.

 

Prerequisites:-

Can access computer system and make use of internet to perform search over Google.

Need to Prepare?

Prior programming experience is not required.

Tools Expected:-

Windows OS based PC, Smart phone with Internet, Notebook and Pen

Tools Provided (for the session):-

Hadoop, Hive, PIG and HBase

Concepts:-

Datatypes, Hadoop, Sqoop, HBase and Oozie

Summary:-

This workshop will provide an opportunity to participants to work upon Real Life Project on Big Data Analytics and gain hands on project.

Project:-

  • Real Time Twitter Data Acquisition
  • Data Analysis on Medicare Data.

Commitment:-

2 Days (7 hours each including 1-hour lunch break)

Agenda:-

Day 1

Session 1-  (03:30 hrs)

Introduction to Big Data
  • How Big is this Big Data ?
  • Definition with Real Time Examples
  • How BigData is generated with Real Time Generation
  • Use of BigData-How Industry is utilizing BigData
  • Traditional Data Processing Technologies
  • Future of BigData!!! 
Hadoop
  • Why Hadoop?
  • What is Hadoop?
  • Hadoop vs RDBMS, Hadoop vs BigData
  • Brief history of Hadoop
  • Apache Hadoop Architecture
  • Problems with traditional large-scale systems
  • Requirements for a new approach
  • Anatomy of a Hadoop cluster
  • Hadoop Setup and Installation
Hadoop Ecosystem
  • Brief Introduction about Hadoop EcoSystem (MapReduce, HDFS, Hive, PIG, HBase).
 
Session 2- (02:30 hrs)
HDFS
  • Concepts & Architecture
  • Data Flow (File Read , File Write)
  • Fault Tolerance
  • Shell Commands
  • Java Base API
  • Data Flow Archives
  • Coherency
  • Data Integrity
  • Role of Secondary NameNode
  • HDFS Programming Basics
Session Recap
 
Day 2
Session 1- (03:30 hrs)
MapReduce
  • Theory
  • MapReduce Architecture
  • Data Flow (Map – Shuffle – Reduce)
  • MapRed vs MapReduce APIs
  • MapReduce Programming Basics
  • Programming [ Mapper, Reducer, Combiner, Partitioner ]
HIVE & PIG
  • Architecture
  • Installation
  • Configuration
  • Hive vs RDBMS
  • Tables
  • DDL & DML
  • Partitioning & Bucketing
  • Hive Web Interface
Session 2- (02:30 hrs)
PIG
  • Why Pig
  • Use case of Pig
HBase
  • RDBMS Vs NoSQL
  • HBase Introduction
Session Recap
 
Zonal Round of SkillThon
  • Competition
  • Certificate distribution and acknowledgement

Charges:

INR 1200 (GST exclusive) Per Participation