Biweekly Reading 0x10

2017-02-20Home

It's 2017 and I'm still struggling with my procrastination. The last reading was more than a month ago and articles have been piling up in my Pocket list. Not to overburden and scare off myself from this routine, I'd like to just do a biweekly reading to start again. Another reason is the value of news decay quickly as the value of data. I'll start with open source releases.

Releases

Google announced the public beta for Cloud Spanner: a global database service for mission-critical applications. People have been discussing whether it beats the CAP theorem. Eric Brewer, the author of CAP and VP of Google Cloud shared his idea. Spanner is technically a CP system while providing more than five 9s of available in practice. It is achieved through Google's private network.

Google controls the entire network and thus can ensure redundancy of hardware and paths, and can also control upgrades and operations in general.
Apache Flink made 1.2.0 major release with a bunch of new features
- supports changing the parallelism of a streaming job by restoring it from a savepoint with a different parallelism.
- redistribution of kafka partitions and offsets among consumers
- async I/O operator
- queryable state
- allows user to restart from a 1.1.4 savepoint
- enhanced Table API & SQL (e.g. window aggregations over streaming tables)
Apache Storm released 1.0.3 as a maintenance release, to be followed by 1.1.0 soon.
Akka 2.4.17 released with security patch for a potential security issue with Java deserialization.
Databricks blog gives highlights and tutorial of Intel's recently released BigDL project. BigDL will sit on the same level as Structured Streaming, MLLib and GraphX in the Spark Stack.
Intel and Cloudera have collaborated to speed up Spark’s ML algorithms, via integration with Intel’s Math Kernel Library. Benchmark results show performance boost against JVM based execution and OpenBLAS

Conferences

Recap of Machine Learning @Scale 2017 is posted on Facebook Code blog. For those not familiar with this conference like me

Machine Learning @Scale is an invitation-only technical conference for data scientists, engineers and researchers working on large-scale applied machine learning solutions
Keynotes and highlights from Databricks' speakers at Spark Summit East 2017 (with videos and slides) are available on their official blog. For users blocked from YouTube, check out this link.

Industry

Confluent Grows Subscriptions by Over 700 Percent in 2016 as Businesses Seize the Power of Real-Time Data. Confluent Enterprise product is built around Apache Kafka™, including Control Center to manage Kafka at scale, Kafka Connector API to connect with other systems and Kafka Streams API for lightweight stream processing. Kafka in Big Data is similar to pipe | in Unix.
Hortonworks reports record 2016 revenue of 184.5 million and 52.0 fourth quarter revenue.
Evernote migrates to Google Gloud Platform with 5 billion notes and 5 billion attachments, or over 3 petabytes data.

Programming

What's Functional Programming All About? Li haoyi gives his answer as

The core of Functional Programming is thinking about data-flow rather than control-flow

He has written a lot of worth-reading articles as well as high-profile tools.

Streaming

Stateful processing has been added to Apache Beam, the unified model of batch and streaming. Here is a nice guide to walk you through the new feature.

That's all for the first reading in 2017. I'd like to write more posts this year, and beyond weekly reading. Let's see if I can win the battle against procrastination.