Optimized time series management

NSDb is a time series database, thus it is optimized for handling time series data, i.e. a series of multi dimensional data points indexed by time. A time series record, called Bit in the NSDb nomenclature, is a structure composed of:

If we want to model the number of accesses to emergency rooms in a country (or in a state or a region) we can use a time series record (a Bit) with the following features:

Metrics and Bit

A series of time series records (bits) is called metric.

NSDb provides two levels of containers for metrics: database and namespace.

A bit belongs to a namespace, which belongs to a database.

To declare the database and the namespace is not required in a sql statement, but it is necessary for the outer context (e.g. rest apis, java apis or cli)

Metric Info

The following properties can be configured for a metric. This configuration must be performed before inserting bits and cannot be updated.


Schema design

Given the time series nature of NSDb, it would be improper to talk about a metric’s schema because each bit can have different dimensions. Anyway, it is important to point out that bits of the same metric must be homogeneous to each other. The following rules must be followed by a series of Bits:

Given the above preconditions, it is possible to infer and retrieve a schema for a metric, which will be composed of:


Sql support

NSDb supports a custom SQL like query language. For more information, the full doc could be examined.

Essentially, dimensions and timestamp can be used to compose where conditions or group by clauses and value for aggregation clauses.

Reusing the ER example explained above, in order to select accesses with WHITE priority and sort results by the time we can run this query:

select * from accesses where priority = WHITE order by timestamp desc

while for retrieving the sum of accesses grouped by priority this query can be used

select sum(value) from accesses group by priority

Publish-Subscribe Streaming

Perhaps the most interesting feature of NSDb is a special case of data streaming based on a publish-subscribe pattern.

Publish-Subscribe Model Publish-Subscribe Model A user needs to subscribe by performing the following steps:

NSDb will return the query results, the Historical data first, then, every time a record that fulfills the query provided is inserted, it will be sent to the subscriber’s websocket. If we consider NSDb as a Source and the subscriber as a Sink, this publish-subscribe mechanism can be interpreted as a simple streaming pipeline.


Overall Architecture

Overall Architecture Publish-Subscribe Model

NSDb strongly relies on akka, the implementation of the Actor Model on the JVM.

This model gives the right abstraction to build high traffic, concurrent and distributed systems. Using it makes easy to handle concurrency, and also provides an abstraction for creating and maintaining a cluster of actors (e.g. it’s not necessary to implement gossip and anti-entropy protocols).

Storage System

NSDb takes full advantage of Apache lucene™, thanks to which it is possible to index bits and run high performance searches. It also provides many primitives useful to optimize indexes and related queries to fit time series requirements.

In addition to Lucene, the native file system is used for writing a commit log, used to keep track of all the writes and deletions happened in the system; it is essential in case of disaster recovery. In fact, all the log entries are written before being performed, so that it is possible to replay them in case of failure.

Consistency

In order to maximize write throughput and also to guarantee a real time events delivery, It has been decided to adopt a custom consistency strategy. After the write operation has been logged in the commit log, it is accumulated into an internal buffer which is periodically flushed into the index, and in parallel it is checked against all the subscribed query and published to all the suitable subscribers. This means, that from the subscriber point of view, the effects of an insert operation will be observable in real time, while for ordinary users, NSDb behaves like a eventually consistent system, i.e. records are stored only after a configurable interval.

Shards

One of the most relevant properties of a time series database is that data continuously comes, will be always inserted and never updated, and given the nature of the data itself, that is the fact that it’s naturally ordered by time, leads us to adopt a sharding strategy called Natural Time Sharding.

Users can configure a sharding interval

sharding { 
    interval = 15s //15m 15d 1y are valid values
}

according to which, shards will be created.

Privacy Policy Radicalbit S.r.l. • VAT IT09292580967‬