featuring Structured Streaming and Apache Kafka
The course is over 3 hours long and with practical work should take a day or two to complete
Having problems? check the errata
Introduction and DStreams 55m 53s DStreams is an older API but it is still in use, so we'll establish the basics of Streaming with this API. We'll use a simple socket server to simulate a stream of data. |
Preview |
Integrating with Apache Kafka 77m 52s Apache Kafka is a highly performant, distributed event log and is perfect for use in streaming applications. Here we use it as a repository for holding a real time stream of events. We integrate with Spark Streaming using the Kafka module. |
Watch |
Structured Streaming 66m 45s This newer API builds on the SparkSQL/DataFrame API and is a much more elegant system. Through this chapter we rebuild our previous work, and we discover how it can be used to build a streaming pipeline. |
Watch |