Spark Module 4 Streaming and Structured Streaming

Try for free!

Subscribe and stream all our courses
from just USD30.00 per month
Start my free trial

Spark Module 4 Streaming and Structured Streaming

featuring Structured Streaming and Apache Kafka

The course is over 3 hours long and with practical work should take a day or two to complete

  • Learn how to use Apache Spark for real-time streaming big data!
  • Both DStreams and Structured Streaming are covered
  • Use Apache Kafka to build a near-continuous realtime big data pipeline
We'll assume you're already familiar with Spark and SparkSQL - modules 1 and 2 in this series cover the basics

Contents

Having problems? check the errata

Introduction and DStreams 55m 53s

DStreams is an older API but it is still in use, so we'll establish the basics of Streaming with this API. We'll use a simple socket server to simulate a stream of data.

Preview

Integrating with Apache Kafka 77m 52s

Apache Kafka is a highly performant, distributed event log and is perfect for use in streaming applications. Here we use it as a repository for holding a real time stream of events. We integrate with Spark Streaming using the Kafka module.

Watch

Structured Streaming 66m 45s

This newer API builds on the SparkSQL/DataFrame API and is a much more elegant system. Through this chapter we rebuild our previous work, and we discover how it can be used to build a streaming pipeline.

Watch
Copyright ©2024 VirtualPairProgrammers.com