Weekly Reading 0x4

2016-05-08Home

I'd like to start this week with something other than code but how to present code. There are 5 basic rules to follow,

use monospaced fonts
use big fonts
use syntax highlighting only where needed
use ellipsis
use screen annotation

Remember your slides are not your IDE

Now let's get down to code.

Streaming

Unlike the batch world where Spark "rules" (Tez, Flink anyone?), the streaming world has entered into a war era. There are so many streaming solutions, each of which has its own pros and cons. Need an apple-to-apple comparison ? An Overview of Apache Streaming Technologies is for you.
Although listed in the previous comparison, Apache Beam is not yet another streaming technology but

provide the world with an easy-to-use, but powerful model for data-parallel processing, both streaming and batch, portable across a variety of runtime platforms

It's formerly Google Cloud Dataflow SDK and requires a runner (e.g. Google Cloud Dataflow, Flink, Spark) to work. Why Apache Beam? A Google Perspective explains why (open sourcing) the project makes sense for Google, and from the business perspective too.

That motivation hinges primarily on the desire to get as many Apache Beam pipelines as possible running on Cloud Dataflow.
Beam has such nice feature as auto-scaling. Streaming Auto-scaling in Google Cloud Dataflow has more details.
Apache Kafka has a data structure called purgatory, which holds any request that hasn't yet met its criteria to succeed but also hasn't yet resulted in an error. Apache Kafka, Purgatory, and Hierarchical Timing Wheels talks about how Kafka efficiently keep track of tens of thousands of requests that are being asynchronously satisfied by other activity in the cluster. Hierarchical Timing Wheels is really a great data structure to know.
Storm 1.0.1 is released as a maintenance release that includes a number of important bug fixes that improve Storm's performance, stability and fault tolerance.

Database

“Does the Database Community Have an Identity Crisis?’’

The answer is yes, according to Peter Bailis. He looks at yesterday and today of database research and offers advice for tomorrow to make fossils productive again.

Machine Learning

DeepMind moves to TensorFlow after Torch7 has served as their primary research platform for nearly four years.

Others

Julia Evans is bothered at work by that people who knows amazing things often get knowledge stuck in their head and others don't end up learning it. He looks into this in How does knowledge get locked up in people's heads?. I've been a fan of sharing at work, and as he Julia says, being asked questions will make you an expert.

Think about it. I'll leave you here.

ManuZhang's Blog

Static Blog generated in Scala

Weekly Reading 0x4

Streaming

Database

Machine Learning

Others