GO’CIRCUIT
A project by Petar Maymounkov.

Scale-free engineering

 

The end-goal of the Circuit Project can be summarized in one breath:

The human process of writing and sustaining distributed applications should be indistinguishable from that of not-distributed ones from an application engineer's point of view, and it should be minimal.

There is an obvious subtext here. Since we already have non-distributed systems that work well, the main implication is that we need to bring down the complexity of distributed systems to that same familiar level, and do so for the continuous future.

For sake of convenience, we dub this desired state of the art scale-free engineering. The term scale-free, in the abstract, refers to something that does not change qualitatively as it grows in size. It was first introduced to a larger audience in the context of studying the propreties of organically-grown social graphs by phycisist Albert-László Barabási. This term also implies a certain "holographic" property, whereby any-size subset of the whole has the same effective properties as the whole. This is something that we touch upon later.

To some our end-goal may sound absurd. This is far from being the case. In fact, everything we know today not only suggests this goal is attainable, but it also suggests that now is the ripe time to attack it.

The key high-level argument for the feasibility of scale-freeness is a simple observation, which we are about to unfold with care. A system (the package of a development framework and its runtime) has the sole purpose of translating human linguistic instructions to machine behavior. The feasibilty of any such system depends on two factors alone: the programming language(s) exposed to the human operator, and the limitations of the hardware that must be harnessed to realize the linguistic instruction faithfully, while exposed to unexpected (but of known character) extraneous conditions that affect the hardware.

In defining the term system as we have, it is imperative to make a clear distinction between two roles of human involvement. A system caters to the needs of application engineers who use it to write domain-specific software. Ourselves, the makers of such system, are system engineers. It is no co-incidence or detail to be overlooked that scale-freeness addresses indistinguishability specifically and only from the point of view of the application engineer.

With this basic terminology under the belt, we proceed with the high-level argument asserting feasibility of scale-free engineering.

Feasibility

Most successful systems (C/C++/LLVM, Java, Python, MatLab, FORTRAN, etc. and the supporting software/OS stack below them) in existence today harness a single computer. The hardware of every computer today — a desktop, a notebook, a hand-held — can be summarized in a common frame: It consists of multiple error-prone components that communicate with each other over error-prone communication channels. (Note that errors in communication or computation are a harsher model of the world, compared to failures or erasures, since they are significantly harder to detect and handle.)

Despite the non-zero probability of errors within just about every aspect of a typical computer, programmers of single-computer applications maintain no concept of failing hardware. All such extraneous (to the application) conditions are concealed away and reduced to an extremely rare singular all-fail condition. This is achieved entirely within software, which finds itself within various layers of a sophisticated stack – hardware ROM drivers, operating system, as well as supporting user-space software.

That same frame accurately describes the hardware of any distributed application — be it Google's multi-datacenter infrastructure or an Ericsson telecom switch box or even a collaborative peer-to-peer network.

The difference between distributed and not-distributed hardware (in the informal sense of "distributed" because ultimately everything is distributed) is purely quantitative. Datacenters have more components than one computer (tautologically) and IP networking has higher probability of failure than motherboard buses. But that is it. Qualitatively, or behaviorally, they are the same. And since software design is entirely a qualitative pursuit, it follows by mere implication that current language and software design should be fully able to conceal the complexities of a whole Internet away.

That same statement, in a more down-to-earth engineering language, says that we need to build an operating system that presents a clean uniform view over a complex and error-prone underlying hardware network and equip it with linguistic apparatus that enables human productivity, just like Linux is equipped with a wealth of languages that enable the present small-scale science and technology.

Walk into the near future

If we would take a minute to contemplate the benefits of bringing distributed systems to par with not-distributed ones, we would expose countless benefits that have the potential to alter the trajectory of an economy.

For loops are MapReduce, enabling science

The future, from the point of view of the programmer, was one of the early ideas that fueled the circuit. Consider a for-loop over the elements of a large input array in memory. If the input grows even barely past the RAM capacity of a reasonable personal computer, one needs to rewrite this 3-line for-loop into a what's now known as MapReduce — a heavy weight thing, orders of magnitude more complicated (even purely in terms of application, not systems, engineering involvement in most interesting cases) than the for-loop.

Extending this example to the full reach of a modern programming language clearly has far-reaching consequences. For one, it has produced the legendary reliable distributed systems of Ericsson which, albeit fairly narrow-purposed, have shown us how complex and reliably software can be with the right surrounding system.

Looking into the future is considerably more exciting though. Small applied theory teams, like the recently groundbreaking team of Geoffrey Hinton, focus on the early-design stages of novel algorithms that later turn into the Googles of the world. In some cases, these teams are able to carry out the realization of their ideas beyond the theoretical, as they are quite well equipped to implement scientific computations within application programming environments like MatLab and high-level languages like Java and Python. In recent trends, however, most interesting algorithmic research is shifting towards the large scale (requiring multiple computers) which has rendered most pre-existing tools too cumbersome to re-purpose for the multi-machine setting, without a highly specialized low-level/systems team that most applied groups cannot afford. Famously, Geoffrey Hinton's groundbreaking research in Neural Networks has had, in his words rephrased, diminishing returns not due to lack of ideas but due to computational obstacles. Consequently, he has left academia as a talent acquisition and joined Google — a company that would act as an infrastructure provider to his talent.

One has to wonder how many less visible research teams stop short of great accomplishments due to similar reasons.

Pleasingly, already we have essentially achieved linguistic parity between distributed and not-distributed programming, and our circuit tutorials clearly demonstrate the validity of our claims above.

Executables instead of SaaS or enterprise

One of the less noticable, but extremely important, ideas embedded in scale-free engineering is binary and execution parity with not-distributed systems. It is easy to overlook the profundity of distributable binaries (as in "we can give out our compiled executables to others who can execute them without the source") in the single-machine world that we have. Not only are they able to run in forwards- and backwards-compatible ways on ever so slightly changing underlying (single-machine) hardware, but also they are able to dynamically acquaint themselves with the surrounding execution environment and adapt themselves to play equitably and collaboratively with the other unknown pieces of software sharing the hardware:

Executable binaries are runtime-aware.
This is accomplished entirely automatically and it is amazing.

If we were to imagine executable binaries for distributed systems — i.e. if a distributed analog of the single-machine executable existed — the possibilities are impressive.

Analogously to a traditional executable, a distributed one should be equipped to run on a heterogenous array of computer hardware. And when executed, it should familiarize itself with its surrounding environment (possibly an entire datacenter) in a simple standardized way, and it should arrange to execute itself across a potentially complicated multi-datacenter system in a collaborative manner.

Distributed executables fill an argueably already-existing gap between two extreme ways of conveying software: as a service (SaaS) and as a pre-loaded with software hardware box (enterprise solution). For some companies SaaS is too light-weight, for others enterprise solutions are too heavy-weight. SaaS simply cannot be used for time-sensitive or mission-critical tasks since it requires that information flows through the slow and insecure Internet. Enterprise solutions, on the other hand, can be either too expensive or too inflexible to incorporate into a company's logic.

Distributed executables are the middle-ground between SaaS and enterprise solutions.

We envision, with distributed executables, one should be able to deploy a cluster service (like the Solr search engine) by merely invoking an executable under a simply specified Service Level Agreement (SLA).

What makes runtime-awareness (and therefore executable binaries) possible in the single-machine world is a highly-abstracted and standardized view over the complex hardware resources and low-level algorithms provided by a computer. For example, the remarkable abstraction of a file system is a uniform lense into every type of storage device, but not only. It has been better than sufficient for abstracting away even network services (in the Plan 9 operating system) and stateful runtime systems (like the process file system /procfs in Linux, as well as device file systems on most POSIX systems). Another almost-invisible abstraction is the job scheduling mechanism of every modern OS or language-runtime that unlocks concurrency even within the constraints of a single CPU.

Trends in industrial cloud computing are already suggesting a clear trajectory in that same direction. General-purpose distributed file systems (and protocols) that encompass multiple datacenters are already in place. Such are Microsoft's Wide-area File System and Google's Colossus File System. Datacenter-scale job schedulers (the analogs of traditional process and thread schedulers) also abound. Early examples in this category are Google's Borg scheduler, the academic project Mesos, and many others.

It is therefore clear that uniform datacenter abstractions are not only feasible, but are the silent choice of the engineering majority.

Closed circuits

Long-running applications in the traditional world are a remarkable concept: These are programs that are so well designed in being self-contained that they can run effectively forever unsupervised. Achieving their distributed analog is a valueable target. The engineering community has already witnessed the power of this concept. Ericsson's telecom boxes enclose hundreds of computers, running millions of lines of code that implement complex interdependent distributed topologies. These boxes can run un-interrupted and unsupervised for years, relying solely on hotfixes over the Internet.

Interestingly, this level of reliability has never been attained in the datacenter world of Internet companies, banks, etc. One could speculate as to why this is the case. Olivier Pomel (Founder of DataDog) has insightfully suggested to me that the reasons are cost, economics and short-sight: When faced with a trade-off, most companies would prefer a higher-velocity product development cycle than a more reliable technology. It is cheaper for a fast-paced company to hire human supporting stuff to oversee an imperfect technology than to invest large amounts of money into complex, out-of-expertise, fundamental long-term solutions to distributed operational problems.

The unique nature of Ericsson's products has rendered reliability an absolute priority, which has pushed them into a very different engineering path than most other companies. Ericsson's accomplishment is due to the successful design of the Erlang distributed programming language and its "dual" runtime Open Telecom Platform (OTP). Ericcson's technology has not been able to gain traction within the Internet world, despite concerted effort, due to the narrow application domain focus of the language itself. This however is an accidental hindrance, not an essential technological crux. The semantic aspects (e.g. the notion of spawning a thread) and primitives (like communication channels) of Erlang that pertain to the distributed nature of the problem are not domain specific and, in fact, they are found in modern Internet languages like Go. This has advised our natural choice of Go as the "birth" language for the circuit.

We have chosen to dedicate a term — closed circuits — to such self-sustained, long-running distributed applications, because they enable novel uses. These remarkable new uses are discussed in the next section. Here we ponder the feasibility of closed circuits for a moment more.

It is pretty clear that in order to enable complex applications that are robust to the myriad of interdependent failure events of distributed systems, these failures must be concealed away from the application engineers. Indeed if this is not the case, not only distributed application code will not be portable, but complex applications will not be possible at all. In fact, current industrial trends already suggest the latter is happening: Most mid-size businesses today are forced to use clumsy systems like Hadoop for lack of better choices, and because of it they operate in a constant regime of product compromises that stay within the feasibility realm of Hadoop, in this example. This informs a design axiom that the circuit project adheres to:

Application engineers should be entirely shielded from any non-application events.
(Non-application events encompass all asynchronous, external error conditions that are artifacts of the physical implementation of the computational platform, and more.) Attaining this clean isolation between application and system has been attempted many times, but not in a sufficiently general way. For example, Twitter's Storm Project achieves programmability of consistent, massively-scaled data processing applications in the streaming model. Unfortunately, it pertains just to data processing and just to the streaming model.

The circuit design, on the other hand, is general purpose. One could implement a low-level application, like Google's distributed locking service Chubby, as well as higher-level data applications, like those targeted by Storm.

Google in a bottle

Once a massively-distributed software system is able to operate like a calculator — without the supervision of its creator — entire new industrial branches could emerge. Nothing precludes an entire web platform, like Google, from being distributable as a single alebit large executable. In principle, one should be able to buy such an executable and deploy with a (virtual or not) hardware provider like Amazon EC2. Naturally for this to be feasible, not only cloud applications need to be representable in the form of portable binaries, but additionally they need to represent closed circuit applications, so that they can provide a fire-and-forget functionality to the user.

Such a deployment mechanism addresses another emerging, but immature, trend in the industry — that of turnstile services. Some entrepreneurs, like Diaspora, have been trying to provide web services in the form of a self-contained simple executables that a user could deploy on their own choice of hardware provider, so as to ensure that the underlying software does not leak or mismanage private information. A Diaspora turnstile node, for example, correponds to a single user's Facebook account, so to speak. Current turnstile technology is quite primitive. It typically constitutes software that, from the point of view of a hardware provider like Amazon, executes on a single machine.

Such turnstile solutions have had a hard time getting off the ground, despite the fact that they do make good sense, due to the fact that without a common distributed standard hardware providers are not able to make a single offering that easily runs the heterogenous set of turnstile solutions. Th circuit, a system that lives on top of a tradition OS like Linux, already allows for a turnstile solution deployment over pre-existing Linux cluster providers.

In technical terms:

Distributed executables that are closed circuits enable delegation of arbitrary cloud applications on trusted hardware.
In fact, distributed executables pave the way to an even more ambitious accomplishment of science, already settled in the theoretical stage:
Closed circuits are necessary for securely delegating computation to third parties that are entirely untrusted.

Indeed, we already know how to perform private computation on untrusted hardware without leaking any information, which is known under the technical name homomorphic encryption. And the crux to wide realization in practice involves establishing standards for portable distributed executable formats, as well as micro-optimizations in the way homomorphic encryption is implemented at the virtual instruction level.

Public software transparency

Metaphorically we refer to distributed circuit executables as running "in a bottle" because they are isolated from interactions with the outside world, beyond what the user allows, and furthermore most of their operation is transparent and observable from the side. For example, some early stage projects like Obelisk aim at providing full real-time visibility into the information flow incurred by a running circuit application.

This opens new venues for software utilization in sectors of the economy that, for legal and other reasons, were previously unable to peruse third-party software. For example, a government security agency could deploy a third-party distributed executable without risking information leaking due to the information flow control afforded to the executing party by a system like the circuit. Furthermore, such agency would be able to continuously monitor the progress of such running cloud software, using transparency monitoring tools like Obelisk. This opens a new win-win venue, whereby the agency takes no information leaking risk while the third-party provider does not have to disclose their know-how by providing compiled distributed executables, instead of source code. These benefits come in addition to the fact that out-sourcing is much more frictionless when binaries are at play instead of source.

Timeless distributed virtual machines

In recent years the Computer Science community has had growing concern with the reproducibility of past software executions and the readability of legacy file formats. Some of the solutions holding promise in the space entail building virtual machines that can execute legacy code and therefore legacy tools that can read legacy formats.

An identical concern should be growing within the cloud computing community. The algorithmic history of web sites like Facebook hold valuable information about our own history. To reproduce and preserve the behavior of a cloud application as complex as Facebook, one must build it within a unified framework where the program source describes the full behavior of the final product. On the contrary, current practices build individual components (databases, web servers, caches, etc.) separately and consequently glue them together at runtime via a combination of scripts and real-time administrator's involvement. This approach is impractically hard to replicate and no one would likely ever dare to.

Comments? Please join the Google+ discussion. A read-only copy follows: