Improve your coding skills from beginner to expert with the largest online Java e-learning platform

Spark Module 4 Streaming and Structured Streaming

featuring Structured Streaming and Apache Kafka
  • Learn how to use Apache Spark for real-time streaming big data!
  • Both DStreams and Structured Streaming are covered
  • Use Apache Kafka to build a near-continuous realtime big data pipeline

Pre-requisites

We'll assume you're already familiar with Spark and SparkSQL - modules 1 and 2 in this series cover the basics

Contents - The course is over 3 hours long and with practical work should take a day or two to complete

 

Having problems? check the errata for this course.

1

Introduction and DStreams


55m 53s
DStreams is an older API but it is still in use, so we'll establish the basics of Streaming with this API. We'll use a simple socket server to simulate a stream of data.

2

Integrating with Apache Kafka


77m 52s
Apache Kafka is a highly performant, distributed event log and is perfect for use in streaming applications. Here we use it as a repository for holding a real time stream of events. We integrate with Spark Streaming using the Kafka module.

3

Structured Streaming


66m 45s
This newer API builds on the SparkSQL/DataFrame API and is a much more elegant system. Through this chapter we rebuild our previous work, and we discover how it can be used to build a streaming pipeline.

Let the Course Come to You

About Us Pricing Frequently Asked Questions Contact Privacy T&Cs Affiliates and Resellers
Facebook Twitter YouTube LinkedIn