Spark Module 3 Machine Learning SparkML

Try for free!

Subscribe and stream all our courses
from just $30.00 per month
Start my free trial

Spark Module 3 Machine Learning SparkML

Machine Learning for Big Data

The course will take on average 3 days to complete, including practical work

  • Learn the basics of Machine Learning and how to apply to big data with SparkML
  • Supervised vs Unsupervised Learning
  • Linear Regressions
  • Logistic Regressions
  • Decision Trees
  • K-Means Clusters
  • Random Forests
  • Recommender Systems
We assume you're already familiar with Spark Core from modules 1 and 2.


Having problems? check the errata

Introduction 24m 2s

What is Machine Learning, Supervised vs Unsupervised Learning and the Model Building Process


Building a Linear Regression 30m 40s

Assembling vectors of features and Model Fitting


Training Data 26m 33s

Training vs Test and Holdout Data, Using data from Kaggle, RMSE and R2 tests


Model Fitting Parameters 25m 41s

Setting Linear Regression Parameters


Feature Selection 36m 23s

Correlation of features, Identifying duplicate features, data preparation


Non Numeric Data 25m 48s

Using OneHotEncoding and Vectors


Pipelines 19m 42s

How to build a pipeline in SparkML


Case Study 34m 51s

A full practical exercise


Logistic Regression 26m 12s

True and False Negatives and Postives, Coding a Logistic Regression Model


Decision Trees 46m 21s

Building a decicision tree model, Interpreting a tree and Random Forests


Unsupervised Learning: K-Means Clustering 10m 49s

K-Means Clustering and how to implement in SparkML


Recommender Systems 29m 7s

Matrix Factorisation and how to build a model in SparkML

Copyright ©2024