GO’CIRCUIT
A project by Petar Maymounkov.

Anchor file system

 
The anchor file system is a circuit runtime facility for keeping track of live circuit workers in a customizable structured manner that supports simple and expressive semantics during service registry and service discovery. The anchor file system is exposed both to the circuit programming interface as well as to the command-line toolkit.

Table of contents

Introduction
Design principles
Implementation
Programming interface
Command-line utilities
The future

Introduction

Cloud applications can grow fairly complex. In principle, one could implement an entire cloud backend — starting with the front-end HTTP servers, including the data analysis pipelines and going all the way to the persistence layers — within a single circuit environment. There are good reasons to do so. For example, any entirely-in-circuit data processing pipeline can be made end-to-end idempotent with little effort.

Large systems inevitably require dynamic in-production “attention” — frozen processes need to be restarted, failed hardware replenished, and so on. For such tasks, it is instrumental that one is able to have a clear “map” of processes running within a distributed deployment. For this purpose, traditional industrial infrastructures maintain complex systems of configuration files that identify supposedly live (i.e. deployed) worker processes in conjunction with a usually-separate monitoring system that registers process failures. These systems suffer from a couple of problems:

The anchor file system addresses both issues in a simple and practical manner.

Design principles

The anchor file system is a virtual distributed file system that keeps track of running worker processes across an entire circuit deployment. This is akin to Linux' process file system, /procfs, but different in a few significant ways.

Upon spawning a new worker process, the application logic spawning the new worker must elect zero or more anchor file system directories, under each of which the new worker will register itself by means of an automatically created anchor file. Each anchor file, corresponding to the same worker, is identical and its name always equals the string representation of the 64-bit worker identifier. Worker identifiers are unique across a circuit deployment. A typical path of an anchor file within the file system looks like this:

/firehose/clients/spam/R6aca01d945c3b861

The directory name /firehose/clients/spam/ is supplied by application logic upon spawning, whereas the file name R6aca01d945c3b861 is assigned automatically by the circuit runtime.

The desired directories do not have to exist on the file system a priori. In fact, there is no explicit mechanism for creating or removing directories. The circuit runtime automatically creates and deletes directories as needed, while ensuring that every existing directory has at least one descendant file. This behavior plays well with the live nature of anchor files (they disappear on their own when a worker dies, often unexpectedly) and relieves developers from clean-up responsibilities.

The anchor file system is effectively a name resolution system, as its main purpose is to discover the worker identifiers of desired subsystems of a larger circuit deployment. One could make a parallel between IP addresses and worker identifiers and, respectively, between DNS and the anchor file system.

The design choice that a worker can be registered under multiple directories is not coincidental. Vanilla file systems semantics, whereby processes can register under a single path, are not sufficiently powerful in way of incorporating process correlation information for the pruposes of enabling easier discovery later on. In UNIX file systems, this shortcoming is addressed by the introduction of symbol (and other) links. But practice has shown that file system links suffer from various problems. Creating links is essentially equivalent to trying to predict future file system queries and planting a result for them. This makes them hard to use and the follow-on queries more limited in scope and variability.

(For the more mathematical reader, a vanilla file system is no more than a recursive partition of the set of all files. Partitions, however, are a limited abstraction: They don't accommodate intersecting sets. This is the root issue we are aiming to fix.)

Instead, we take a different route. One should easily be able to see a full equivalence between the anchor file system semantic and the GMail email labelling metaphore. Worker processes correspond to emails, and worker's anchor directories (or anchors, for short) correspond to email labels — recall that GMail allows for recursively structured labels. More formally, an anchor directory defines a subset of workers — comprising the workers whose anchor files are found in that directory — that are in some applicatoin-specific way functionaly similar. As a result of the freedom to register a worker in multiple directories, we are then able to query the file system against complex set-membership predicates, that are easily implemented using standard shell-based UNIX tools for file system access and text processing.

Implementation

The anchor file system is implemented by utilizing an underlying Apache Zookeeper service. This is an obvious choice due to the fact that Zookeeper supports a superset of distributed synchronization primitives, needed by an anchor file system. And also because Zookeeper, as well as its Go driver gozk, have a proven track record of stability and our experience has confirmed that.

In order to keep the circuit clean and independent of the technology choices in implementing its supporting functionality, as well as to give more transparency to engineers, we have elected that the circuit runtime does not manage the required Zookeeper service itself. Instead, a circuit deployment relies on an externally-managed Zookeeper service.

This simple setup has proven sufficient in all practical settings that we have encountered. That said, we wouldn't presume this holds true for everyone. While the file system is part of the circuit runtime, the latter is structured in a modular fashion (within the Go programming environment) so as to make implementation substitutions seamless. In particular, the programatic interface required by a file system implementation is defined in package circuit/usr/anchorfs, while its implementation is isolated in a different package circuit/sys/zanchorfs. Inspecting the code makes it clear how to easily swap different implementations in and out.

Programming interface

Up-to-date API documentation for the anchor file system is available within the godoc pages for the package circuit/use/anchorfs.

Command-line utilities

Comprehensive and up-to-date reference of all command-line tools is found in the Command-line toolkit manual. Here we introduce the main tool pertaining to file system management.

The tool, name 4ls, is a near equivalent of the traditional UNIX ls. Invoked at the command-line, it merely prints the contents of an anchor directory or all of its descendants recursively. Listing a direcory, might look like

% 4ls /firehose/clients
/firehose/clients
/firehose/clients/R486496d5c0419798
/firehose/clients/dev
/firehose/clients/search
/firehose/clients/spam
%

Note that a directory can have both sub-files and sub-directories. Listing a directory subtree, might look like

% 4ls /firehose/clients/...
/firehose/clients
/firehose/clients/R486496d5c0419798
/firehose/clients/dev
/firehose/clients/dev/R486496d5c0419798
/firehose/clients/search
/firehose/clients/search/R6aca01d945c3b861
/firehose/clients/spam
/firehose/clients/spam/R6aca01d945c3b861
/firehose/clients/spam/R486496d5c0419798
%

Note that the same worker may appear inside multiple different directories. You will also notice the familiar Go ellipsis notation for indicating recursion. By specification (not just by implementation), the anchor file system garbage-collects all empty directories and provides a strict guarantee that you will never encounter a directory having no descendant anchor files.

It might seem surprising that 4ls is the only command-line tool for interacting with the anchor file system. However, in conjunction with other command-line tools and using the UNIX shell's redirection mechanisms, it becomes a powerful primitive. Intentionally 4ls outputs a simple list of anchor paths, so that its output can be piped into other commands.

You will notice that the selector syntax for 4ls (namely the use if ellipsis) is borrowed from the go build tool. Other commands try to adhere to similar principles. For example, to shut down all internal upstream clients of the Tumblr Firehose service, we would type:

% 4kill /firehose/clients/...

Whereas, if we'd like to take down just the spam processing pipeline, we would write:

% 4kill /firehose/clients/spam/...

Zookeeper considerations

After a worker dies there is a small delay before its corresponding file disappears from Zookeeper. This is inevitable in general and indeed Zookeeper in particular waits for a session timeout interval before it declares a process unreachable. The session timeout is currently hard-coded to 10 seconds in circuit/kit/zookeeper/zutil.

Note that Zookeeper has certain rules regarding allowable values for session timeout in terms of the tick time. The latter, unfortunately, is configured in a different place: In the Zookeeper configuration file that you write before starting Zookeeper. We recommend not customizing this value. The default works well. That said, here is what the Zookeeper 3.4.5 documentation has to offer:

“… The current implementation requires that the [session] timeout be a minimum of 2 times the tickTime (as set in the server configuration) and a maximum of 20 times the tickTime. …”

The future

The anchor file system provides visibility and control over live workers in the deployment. It is often handy to have the same convenience in inspecting dead workers post mortem when investigating the causes of failed services. The current circuit system provides a myriad of tools (all discussed in Debugging and profiling, live and dead) to do so by leveraging the logging facilities built into each worker. Regardless, we are considering a graveyard file system that gives a convenient interface into the life and death events of generations of past workers, as well as the child-parent relationships amongst them.