GO’CIRCUIT
A project by Petar Maymounkov.

Vena: Substitute for OpenTSDB based on LevelDB

 

This is an advanced tutorial that focuses on system design. For a tutorial on the basics of bulding and deploying, consider the Hello, world! tutorial.

The OpenTSDB system is a distributed database for storing and querying streaming time series, based on Hadoop infrastructure. OpenTSDB is fast and efficient due to its simple access interface. While conceptually simple, OpenTSDB suffers a few shortcommings in practice. It is hard to sustain at scale, because maintaining and tuning the underlying HBASE cluster becomes complex fairly quickly. The simple semantics of OpenTSDB are often slightly misaligned with the needs of the application at hand, which often inspires cumbersome workarounds. The aggregation algorithms of OpenTSDB do not have a sound statistical foundation and are not easily modified.

This tutorial showcases the simplicity of building an OpenTSDB equivalent using the circuit and LevelDB—the building block of Google's BigTable— as a persistence layer. We argue that it is more practical to build your own specialized database using the circuit, than to retrofit an existing one to your needs.

Starting from the baseline implementation here, it is very easy to “grow your logic organically” and incrementally.

Quick start

Navigate to the subdirectory tutorials/vena of your circuit clone. Within it, find a sample app configuration file called app.config. This configuration is geared towards a quick and simple build and deploy on the local machine. The cross-build tool is configured to perform the build on the local machine, and all necessary repos (Go, circuit and app) are rsync'd locally into the build jail. You must modify the hardcoded paths with ones applicable to your system.

You can then build and deploy the Vena app. From subdirectory tutorials/vena:

% CIR=app.config 4crossbuild
This will result in a binary distribution in local directory ~/.vena/ship. Deploy your distribution to localhost:
% echo localhost | CIR=app.config 4deploy
This will install the deployment in ~/.vena/deploy. Launch the Vena shard workers:
% CIR=app.config ~/.vena/ship/vena-server-start -vena vena.config
Note that vena-server-start intentionally does not exit after the workers are started. You can Ctrl-C-exit from it at anytime without affecting the circuit workers. While vena-server-start is live, however, the standard output and error of all vena shards will be forwarded to your terminal, which is handy during debugging.

The syntax of the Vena configuration file is explained in the next section. The flag -vena specifies the name of the Vena configuration file. To start the front workers:

% CIR=app.config ~/.sumr/ship/vena-front-start -vena vena.config -front front.config
Similarly, vena-front-start will not exit after starting the front workers, unless you press Ctrl-C or kill the process otherwise. The front workers configuration file is also described below.

Finally, you can test your deployment:

% telnet localhost 50600
put a 0 0.1 p=q x=y
put a 1 0.2 x=y
put b 2 0.3 p=q x=y
put b 3 0.4 x=y
query a 0 1 p=q
query b 2 3 x=y
dump
The first query should return
0000000000  0.10   1
The second,
0000000002  0.30   1
0000000003  0.40   1
And the dump command should return
PUT ·e40c292c 0000000000 0.100000 ·43ef=·425c ·5087=·4ef4 
PUT ·e40c292c 0000000001 0.200000 ·5087=·4ef4 
PUT ·e70c2de5 0000000002 0.300000 ·43ef=·425c ·5087=·4ef4 
PUT ·e70c2de5 0000000003 0.400000 ·5087=·4ef4 

Configuration

Two configuration files specify the database and front workers independently.

Vena configuration. In JSON format, this configuration file describes the set of LevelDB shard workers to be spawned.

{
	"Anchor": "/vena/server",
	"Shard":  [
		{
			"Key":   0,
			"Host":  "h0.datacenter.net",
			"Dir":   "/var/vena/0",
			"Cache": 90e9
		},
		{
			"Key":   1,
			"Host":  "h1.datacenter.net",
			"Dir":   "/var/vena/1",
			"Cache": 90e9
		}
	]
}
Key is the desired location in the XOR-metric of the given shard, Host is the host where the shard worker will be spawned, Dir is a local directory on that host for the LevelDB files, and Cache is the number of bytes that LevelDB should use for in-memory caching.

Front configuration. In JSON format, this file describes a set of front workers, capable of speaking the OpenTSDB protocol and translating it to often-parallel calls to the shard workers:

{
	"Anchor": "/vena/front",
	"workers": [
		{
			"Host":     "h5.tumbrl.net",
			"HTTPPort": 50500,
			"TSDBPort": 50600
		},
		{
			"Host":     "h6.tumbrl.net",
			"HTTPPort": 50501,
			"TSDBPort": 50601
		}
	]
}
Host is the host where the worker will be spawned, HTTPPort is the port of the HTTP service at this worker, and TSDBPort is the port of the TSDB service at this worker. The HTTP endpoint is not implemented—it is left as an exercise—and in that sense the HTTPPort fields are placeholders. However, they should not be left empty as they will not parse correctly.

Design overview

OpenTSDB is a distibuted database for streaming time series, used by many Internet companies. Architecturally, OpenTSDB defers to Hadoop's HBASE distributed database, a la Google BigTable, for data persistence. This dependance on HBASE complicates the upkeep of OpenTSDB as HBASE needs to be configured separately and in a manner that nurtures caching efficiency.

Illustration of the Hello, world! application logic.

For legacy clients' sake, Vena sports a front-end service that supports the OpenTSDB protocol. The front workers listen on two ports: one for incoming connections with OpenTSDB semantics, and one for HTTP requests. The latter is not yet implemented and left as an easy exercise.

Operations

The following commands can be performed over a telnet connection to the OpenTSDB endpoint of a Vena front worker.

% telnet localhost 50600

PUT

The “write” operation for inserting data is put, illustrated below. Each data point belongs to one time series and pertains to a one-second interval in time. The time series is uniquely identified by the combination of a metric name and zero or more tags, which are opaque key-value pairs.

It is useful to think of a data point as a triple (time, space, value). “Time” is the timestamp, “space” is the identity of the time series the point belongs to (equivalently, the combination of a metric name and set of tags) and “value” is the value of the data point.

Vena shards data points pseudo-randomly based on their space dimension. To guarantee good parallelization at query time, it is the user's responsibility to ensure that points of are partitioned into roughly evenly sized time series. One approach, suggested by OpenTSDB, is to use a host=… tag on each data point. Since queries are always metric-specific, ensuring variability in the tag set results in higher parallelization of query execution.

Put requests do not get a reply for efficiency purposes.

Illustration of the Hello, world! application logic.

QUERY

The query format is illustrated below. At high-level a query selects a set of input time series, “merges” them into a single time series and returns a restriction to a given time interval. In this “tutorial implementation” of Vena, “merging” involves taking the union of all points across time series, while adding the values of temporally coincidental points. Implementing variations on this is a short exercise.

Syntax-wise, the first word in the query, labeled METRIC in the illustration, constrains the input to time series with the given metric. A list of zero or more tag clauses may follow. A tag clause is a key-value pair. Key-value pairs constrain the input further to series that contain the tag clauses. (OpenTSDB also supports a wildcard tag clause that we omit from our implementation.)

Illustration of the Hello, world! application logic.

The result of a query is a time series returned one point per line. Each line consists of three space-separated numbers: timestamp, value and number of input points added into this output point. A query response might look like so:

0000000002  0.30   1
0000000003  0.40   1

DUMP

The singleton dump command will return all data points stored across all shards. The response will look like this:

PUT ·e40c292c 0000000000 0.100000 ·43ef=·425c ·5087=·4ef4 
PUT ·e40c292c 0000000001 0.200000 ·5087=·4ef4 
PUT ·e70c2de5 0000000002 0.300000 ·43ef=·425c ·5087=·4ef4 
PUT ·e70c2de5 0000000003 0.400000 ·5087=·4ef4 
The prefix PUT is included with forsight that some day we might want to be able to forward the dump of one Vena database as input to another. Altogether the output format of a dumped point is identical to the put command format, except that textual metric, key and value names are replaced with their internal IDs.

To recover the original metric, key and value names one needs to implement an reverse lookup server, which we leave as an exercise, because it is not neccessary for the essential function of the service, which is querying.

Code pointers

Vena server shards are implemented in vena/server. Front workers are in package vena/front. The client library that the front workers use is found in vena/client: This is essentially a smart-client library that users of Vena that live within the circuit can use out-of-the-box, whereas external apps have to resort to its OpenTSDB interface.