Weekly Reading 0x3

2016-04-30Home

Some while ago I read from Joe Hellerstein introducing great work from his two former students, The Professors Peter, A and B. This week we have some good stuff from both Peter, A and B.

In this interview with Peter Alvaro on distributed computing, he shared about why distributed programming is hard.

distributed systems are fundamentally hard, as we have always known because of the presence and interaction of two different forms of uncertainty.

The two forms of uncertainty are

  1. asynchrony, which is uncertainty about the ordering and the timing at which messages will be delivered to different nodes.
  2. partial failure, which means that some of your compute components may fail to run, while others keep running and your program nevertheless gives an outcome, which may be incomplete or incorrect.

On the other side, Peter Bailis encouraged everyone to get started in research and gave an example of no one is born a researcher

One of my closest colleagues started off doing technical support during the first dot-com boom with only an undergraduate degree in literature and no background in Computer Science. Today, my colleague is a tenure-track professor doing work I deeply respect and admire.

Guess who is that colleague ? Peter A !!!

Peter B's blog is the best place to know why distributed programming is hard.

This is the end. I mean, the end of beginning.

Streaming

  • Real-Time Event Streaming: What Are Your Options? interestingly decompose a typical streaming architecture into three major components

    1. Producers publish event data into a streaming system after collecting it from the data source, transforming it into the desired format, and optionally filtering, aggregating, and enriching it. (e.g. Apache Flume, Streamsets Data Collector)
    2. Streaming system takes the data published by the producers, persists it, and reliably delivers it to consumers. (e.g. Apache Kafka, MapR Streams)
    3. Consumers are typically stream processing engines that subscribe to data from streams and manipulate or analyze that data to look for alerts and insights. (e.g. Spark Streaming, Apache Storm, Apache Flink, Apache Apex)

Machine Learning

JVM / GC

Container

Papers

News / History

  • ACM RECOGNIZES MAJOR TECHNICAL CONTRIBUTIONS THAT HAVE ADVANCED THE COMPUTING FIELD.

    The 2015 winners,

    • Richard Stallman, for the development and leadership of GCC (GNU Compiler Collection), which has enabled extensive software and hardware innovation, and has been a lynchpin of the free software movement
    • Brent Waters, for the introduction and development of the concepts of attribute-based encryption and functional encryption
    • Michael Luby, for groundbreaking contributions to erasure correcting codes, which are essential for improving the quality of video transmission over the Internet.
    • Eric Horvitz, for contributions to artificial intelligence and human-computer interaction spanning the computing and decision sciences through developing principles and models of sensing, reflection, and rational action.

Now the real end, and end of April.