2016-07-24Home
Okay, the Weekly Reading has finally embarrassingly turned into Monthly Reading. I leave the id auto-incremented to 0x8.
We introduced Kafka Streams about 2 months ago but haven't talked about how does it fit the stream processing landscape?. My two cents is that you get full functionality of stream processing without adding/maintaining another system if Kafka is already used. It makes much sense since the premise holds in many cases. What's different on Kafka Streams ? Michael Noll from Confluent wrote a series about its unique features.
SQL on streaming has divided into two worlds. One is Spark SQL over Structured Streaming. The other is StreamingSQL from Apache Calcite, which is being integrated by Storm, Flink and Samza for their SQL layers.
Another buzzword is IoT Analytics (with streaming). Storm PMC Taylor Goetz shared about this topic in Beyond the Tweeting Toaster: IoT Analytics with Apache Storm, Kafka and Arduino. This is, however, one-way flow of sensor -> connector -> kafka -> storm
.
Spotify runs its services on Google Compute Engine and adopts Google Cloud Dataflow for data processing work. They've implemented a Scala DSL on top of Dataflow SDK which is now being moved into Apache Beam. Check it out at Handling Streaming Data in Spotify Using the Cloud.
Aysylu Greenberg discussed patterns in distributed systems from systems she has worked at Google.
Apache Spark Key Terms Explained covers all the major concepts in Spark from APIs to components.
Speaking of API, have you ever wondered what the relationship is between Spark's three APIs: RDD, DataFrame and Dataset? Now here is A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets.
Spark Summit 2016 took place from June 6 to June 8 in San Francisco. In Spark Summit 2016 Review Michael Malak said
Spark might have been overhyped during the 1.x days, but with Spark 2.0 it's caught up to the hype generated during the 1.0 days.
If you've been puzzled by Scala compiler errors and don't know what to do next, Scala Clippy, which adds helpful messages, might be the right compiler plugin for you.
Stackoverflow: 7 of the Best Java Answers That You Haven’t Seen is a summary by Takipi on most interesting Java Q&A on Stackoverflow.
Stackoverflow has an 34 min outage on July 20. From the official Outage Postmortem,
The direct cause was a malformed post that caused one of our regular expressions to consume high CPU on our web servers.
Nathan Marz questioned the practice of recruiting in The limited value of a computer science education.
Whether someone can or cannot solve some cute algorithm problem in a high-pressure situation tells you nothing about that person's ability to write solid, clean, well-structured programs in normal working conditions.
He said take-home projects is a better alternative. Yes, I agree and I asked the interviewers to give a brief presentation on one of awesome-bigdata lists last time.