Improve your coding skills from beginner to expert with the largest online Java e-learning platform

Spark Module 3 Machine Learning SparkML

Machine Learning for Big Data
  • Learn the basics of Machine Learning and how to apply to big data with SparkML
  • Supervised vs Unsupervised Learning
  • Linear Regressions
  • Logistic Regressions
  • Decision Trees
  • K-Means Clusters
  • Random Forests
  • Recommender Systems

Pre-requisites

We assume you're already familiar with Spark Core from modules 1 and 2.

Contents - The course will take on average 3 days to complete, including practical work

 

Having problems? check the errata for this course.

1

Introduction


24m 2s
What is Machine Learning, Supervised vs Unsupervised Learning and the Model Building Process

2

Building a Linear Regression


30m 40s
Assembling vectors of features and Model Fitting

3

Training Data


26m 33s
Training vs Test and Holdout Data, Using data from Kaggle, RMSE and R2 tests

4

Model Fitting Parameters


25m 41s
Setting Linear Regression Parameters

5

Feature Selection


36m 23s
Correlation of features, Identifying duplicate features, data preparation

6

Non Numeric Data


25m 48s
Using OneHotEncoding and Vectors

7

Pipelines


19m 42s
How to build a pipeline in SparkML

8

Case Study


34m 51s
A full practical exercise

9

Logistic Regression


26m 12s
True and False Negatives and Postives, Coding a Logistic Regression Model

10

Decision Trees


46m 21s
Building a decicision tree model, Interpreting a tree and Random Forests

11

Unsupervised Learning: K-Means Clustering


10m 49s
K-Means Clustering and how to implement in SparkML

12

Recommender Systems


29m 7s
Matrix Factorisation and how to build a model in SparkML

Let the Course Come to You

About Us Pricing Frequently Asked Questions Contact Privacy T&Cs Affiliates and Resellers
Facebook Twitter YouTube LinkedIn