Harvesting and collecting web data can be performed manually via humans “cutting and pasting” or using any number of web data scraping, mining and harvesting tools. Essentially, extracting data from the web involves capturing what you see in your web browser. However, the dynamic nature of the web requires a web browser enabled approach to extracting data.

When defining web data, most people focus on well-known consumer and social media sites such as Google, Amazon, Wikipedia, Facebook and Twitter, but the most valuable data is usually located somewhere else. It could be in industry or location specific sites, password protected B2B portals, cloud apps, government sites and even your competitor’s site. It also includes data locked in applications that lives inside your firewall. Once you expand the extraction to include data from all relevant sources and understand how easy it is to collect and make available to your employees, you begin to realize the enormous potential real-time web data offers.

Media3 offers an exclusive training program. The total training program will take 2 months of time with 5 sessions per week. The final Capstone project will take 1 week of time, With 15 hours of live development experience.

Learn, Work & innovate @ Media3

Course Overview

I am dividing the course into two parts

1) Basic Python Programming

2) Python for Data Science

The first module will contain 15 sessions

Session-1 :

Why should you learn to write program?
What is a program?
The building blocks of program?
Introduction to Python Programming
Why Python ?
Difference between python2 and python3
Installing Python

Session-2 :

Values and types
Python reserved keywords
Statements in python.
Operators and operands in Python
Expression in python
String operators
Running python from terminal

Session-3 :

Boolean Expressions
Logical Operators
Conditional/Alternative/Chained execution
Nested Conditionals


Built-in Functions in Python
Type conversion function in Python
Random numbers in python
Math functions in Python – math Library
Adding new Functions
Use of generators in Python

Session-5 :

While statement
Break and continue statements
For loops

Session-6 :

Len function
String and string slices in python
Looping and counting
In operator
String methods and parsing strings

Session-7 :

File handling in python
Opening files
Reading files
Searching through a file
Writing files

Session-8 :

Lists in Python
Lists slices
List methods
Lists and Functions
Lists and strings

Session-9 :

Looping and Dictionaries

Advanced text parsing

Session-10 :

Comparing tuples
Tuple Assignment
Dictionaries and tuples
Multiple assignment with dictionaries

Session-11 :

Regular expressions in python
Extracting data using regular expressions
Combining searching and extracting

Session-12 :

Handling Json files using python
Handling html files using python
Handling xml files using python

Session-13 :

Web Scraping using python ( This will take 5 sessions if needed. Selenium will be covered and left as optional)

Session-14 :


Session-15 :

Python oop concepts
Use of Python Class functions
Super in python
Building packages using python
Using Decorates in python
Methods in python

Session-16 :

Project -1
Project -2
Project -3

Each project will take two days for time.


Python For Data Science


Installing Anaconda Package
Understanding the use of it
Installing packages for yml file
Installing packages using pip
Installing packages using conda
Installing packages from Git


Introduction to Git
Using Git in Live Projects


Creating Environments in Python
Using jupyter notebooks


Installing numpy
Using numpy for matrix operations and data handling
Numpy operations (Around 50 operations will be discussed)


Using scipy
Scipy functions ( Around 50 important operations will be discussed)
Advanced Algebraic functions in Scipy
Contributing to scipy


Using Pandas
Data Reading and manpulations using pandas
Data Harmonizing using pandas


Using Matplotlib (Graphics in Python)
Using barplot, scatter plot, histograms, stacked bar charts, pie charts etc.
Using seaborn


Integrating Numpy, Scipy, Pandas and matplotlib for Data Manuplations


Statistical Analysis using Python
Mean, Median, Mode
Data Distributions generation
Random variable generations
Variance and Standard deviations


Conditional Probability


Anavo tests
Chi-Square tests
Hypothesis testing in Python


Linear regression using python


Logistic Regression using Python


Risk Analytics ( Project -1 )


HR Analytics (Project-2)


Churn and Telecom Analytics (Project-3)
Projects are subjected to change.
All sessions from 9 will take 3-4 days of time to complete. It depends on Employee/Student understanding of mathematics.