2016-03-15Home
Spark Summit East 2016 is held in NYC last month. Databricks already had a look back, and I'm going to focus on the (Spark) streaming part here.
The most interesting ones are
Mike Freedman, CEO and Co-Founder of iobeam, mainly talked about the challenges in applying Spark to IoT.
I like this talk because these challenges are quite general. It's unclear how iobeam solved them with Spark Streaming which only supports data arrival time. iobeam is a data analysis platform designed for IoT. I really enjoy their websites which put codes side-by-side with use cases.
This is from EMC Video Analytics Data Lake, where Spark Streaming is used for online video processing and detection.
Streaming application serves to feed offline model training which is in turn used to realtime detection.
The talk is really about the architecture evolution from "Larry & Friends" (Oracle) to "Hadoop & Friends" (HDFS, Hive), from Kappa-Architecture to Lambda-Architecture, and finally Mu-Architecture all based on Spark.
Note that realtime here means 15 mins so a low latency streaming engine like Storm is overengineered. It's, however, a sweet spot for Spark Streaming given the other components in the system are also based on Spark.
Spark 2.0 will add an infinite Dataframes API for Spark Streaming, unified with the existing Dataframes API for batch processing. Event-time aggregations will finally arrive in Spark Streaming.
Meanwhile, Back pressure and Elastic Scaling are two important features under development.
Other less interesting use cases