Weekly Reading 0xE

2016-11-20Home

Last week, Apache Big Data Europe was held in Seville, Spain from Nov.14th to Nov.16th. The conference was full of great contents from Apache Big Data projects. My colleagues Karol and Huafeng co-presented Apache Gearpump Next-Gen Streaming Engine with the use case of Gearpump-on-TAP and latest performance data. Huafeng also gave a report comparing modern stream processing engines on functionalities and performances. I will write more about my takes here after going through the slides.

Apache Spark 2.0.2 has been released containing stability fixes. All 2.0.x users are strongly recommended to upgrade. Kafka 0.10 and runtime metrics have been added for Structured Streaming.

Kafka made up a large part of my readings last week.

Anil Kumar from WalmartLabs wrote about how Kafka has decentralized autonomous services and enabled agile development in Apache Kafka for Item Setup.
Confluent announced Interactive Queries for Kafka Streams. The queryable states are stored in embedded databases like RocksDB. Under the hood, each Kafka Streams instance exposes its metadata which a developer could obtain for a given store name and key through Interactive Query APIs. There is no built-in RPC layer for distributed querying but Confluent provides a [reference REST-based implementation](REST-based implementation).
Confluent also walked through their contributions to Kafka Client Ecosystem. Besides the Java client, they have focused on making high quality C client and wrap C client for clients in other languages.
Kafka Streams is a lightweight library and has no built-in scheduler or cluster support. How can it be scaled ? Kafka Streams - Scaling up or down explains it with a simple example.
The scaling of Kafka Streams is built on Kafka's consumer group and its rebalancing feature. Here is article to hep you understand Kafka consumer groups.

What Artificial Intelligence Can and Can’t Do Right Now ? Andrew Ng gives his answers.
- What AI can do ?
  
  If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or in the near future.
- What AI can't do ?
  
  AI work requires carefully choosing A and B and providing the necessary data to help the AI figure out the A→B relationship.
  
  A is the input and B is the response. The necessary data means a huge amount of data which Andrew calls Achilles' heel in today's supervised learning software.
Algorithmia shared about their lessons learned in deploying deep learning at scale. They find the cloud is in its infancy to deploy models into production.

That's all for this week. Happy Reading !

ManuZhang's Blog