The build and deploy sequence for this tutorial is the same as for all others:
tutorials/trendfor the trending blog project.
app.configapplication configuration file to fit your setup.
% CIR=app.config 4crossbuild
% echo localhost | CIR=app.config 4deploy
The architecture of the trending blog app is illustrated below.
The Tumblr Firehose exposes a stream of events that indicate user actions taken on the Tumblr site. We are interested in two kinds of incoming events:
CreatePost events that announce newly created blog posts on Tumblr—each such event delivers the ID of the new post, as well as the ID of the blog it belongs to;
Like events indicate someone liked an existing post—these events carry the blog and post ID of the post being liked.
We aim to keep a live ranklist of at most 10 blogs, whose posts have received the highest number of “likes” throughout the past one minute. At the end, this top-ten ranklist is exposed to external clients via a WebSocket protocol. The intention is that a browser UI would listen to a continuously changing top-ten list and display it to the user live.
The Tumblr Firehose, built around Apache Kafka, is designed so it can serve multiple independent HTTP clients that share credentials. It automatically partitions the stream of live events roughly evenly across all participating clients dynamically.
Our app architecture consists of three differently-purposed sets of workers. A set of “mappers” each connects to the Tumblr Firehose and consumes events greedily as they come. The mappers forward each event to one of multiple “reducer” workers, which are each responsible for maintaining the like-count of a subset of the blog ID space.
Each reducer worker maintains a top-ten list for the blogs it is responsible for. At regular intervals an “aggregator” worker sweeps through all reducers, asking them for their top-ten opinion. It then combines all shard ranklists into a single global top-ten list.
The source of the tutorial is found in the circuit subdiretory
Mappers, reducers and aggregators are all implemented in package
trend/x. The executable command that spawns the service
trend-spawn is implemented in package
trend/cmd/trend-spawn. The tutorial includes a simple “auto-configuration” logic in package
trend/config which demonstrates a simple technique for determining what service configuration to use based on the circuit environment that the executable is being run in.
The source for this tutorial is one of our favorite examples how one can build a non-trivial map-reduce application from the fround up—i.e. without using any auxiliary infrastructure other than the circuit itself—in under 200 lines of code.