Skip to main content

The Ceres Database

Ceres is a time-series database format intended to replace Whisper as the default storage format for Graphite. In contrast with Whisper, Ceres is not a fixed-size database and is designed to better support sparse data of arbitrary fixed-size resolutions. This allows Graphite to distribute individual time-series across multiple servers or mounts.
Ceres is not actively developped at the moment. For alternatives to whisper look at alternative storage backends.

Storage Overview

Ceres databases are comprised of a single tree contained within a single path on disk that stores all metrics in nesting directories as nodes.
A Ceres node represents a single time-series metric, and is composed of at least two data files. A slice to store all data points, and an arbitrary key-value metadata file. The minimum required metadata a node needs is a 'timeStep'. This setting is the finest resolution that can be used for writing. A Ceres node however can contain and read data with other, less-precise values in its underlying slice data.
Other metadata keys that may be set for compatibility with Graphite are 'retentions' , 'xFilesFacter', and 'aggregationMethod'.
A Ceres slice contains the actual data points in a file. The only other information a slice holds is the timestamp of the oldest data point, and the resolution. Both of which are encoded as part of its filename in the format timestamp@resolution.
Data points in Ceres are stored on-disk as a contiguous list of big-endian double-precision floats. The timestamp of a datapoint is not stored with the value, rather it is calculated by using the timestamp of the slice plus the index offset of the value multiplied by the resolution.
The timestamp is the number of seconds since the UNIX Epoch (01-01-1970). The data value is parsed by the Python float() function and as such behaves in the same way for special strings such as 'inf'. Maximum and minimum values are determined by the Python interpreter’s allowable range for float values which can be found by executing:
python -c 'import sys; print sys.float_info'

Slices: Precision and Fragmentation

Ceres databases contain one or more slices, each with a specific data resolution and a timestamp to mark the beginning of the slice. Slices are ordered from the most recent timestamp to the oldest timestamp. Resolution of data is not considered when reading from a slice, only that when writing a slice with the finest precision configured for the node exists.
Gaps in data are handled in Ceres by padding slices with null datapoints. If the slice gap however is too big, then a new slice is instead created. If a Ceres node accumulates too many slices, read performance can suffer. This can be caused by intermittently reported data. To mitigate slice fragmentation there is a tolerance for how much space can be wasted within a slice file to avoid creating a new one. That tolerance level is determined by 'MAX_SLICE_GAP', which is the number of consecutive null datapoints allowed in a slice file.
If set very low, Ceres will waste less of the tiny bit disk space that this feature wastes, but then will be prone to performance problems caused by slice fragmentation, which can be pretty severe.
If set really high, Ceres will waste a bit more disk space. Although each null datapoint wastes 8 bytes, you must keep in mind your filesystem’s block size. If you suffer slice fragmentation issues, you should increase this or defrag the data more often. However you should not set it to be huge because then if a large but allowed gap occurs it has to get filled in, which means instead of a simple 8-byte write to a new file we could end up doing an (8 * MAX_SLICE_GAP)-byte write to the latest slice.

Rollup Aggregation

Expected features such as roll-up aggregation and data expiration are not provided by Ceres itself, but instead are implemented as maintenance plugins.
Such a rollup plugin exists in Ceres that aggregates data points in a way that is similar behavior of Whisper archives. Where multiple data points are collapsed and written to a lower precision slice, and data points outside of the set slice retentions are trimmed. By default, an average function is used, however alternative methods can be chosen by changing the metadata.

Retrieval Behavior

When data is retrieved (scoped by a time range), the first slice which has data within the requested interval is used. If the time period overlaps a slice boundary, then both slices are read, with their values joined together. Any missing data between them are filled with null data points.
There is currently no support in Ceres for handling slices with mixed resolutions in the same way that is done with Whisper archives.

Database Format

CeresSliceData 
 DataPoint+
Data types in Python’s struct format:
Point!d
Metadata for Ceres is stored in JSON format:
{“retentions”: [[30, 1440]], “timeStep”: 30, “xFilesFactor”: 0.5, “aggregationMethod”: “average”}

Comments

Popular posts from this blog

Terraform

Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions. Configuration files describe to Terraform the components needed to run a single application or your entire datacenter. Terraform generates an execution plan describing what it will do to reach the desired state, and then executes it to build the described infrastructure. As the configuration changes, Terraform is able to determine what changed and create incremental execution plans which can be applied. The infrastructure Terraform can manage includes low-level components such as compute instances, storage, and networking, as well as high-level components such as DNS entries, SaaS features, etc. The key features of Terraform are: Infrastructure as Code : Infrastructure is described using a high-level configuration syntax. This allows a blueprint of your datacenter to be versioned and

Java 8 coding challenge: Roy and Profile Picture

Problem:  Roy wants to change his profile picture on Facebook. Now Facebook has some restriction over the dimension of picture that we can upload. Minimum dimension of the picture can be  L x L , where  L  is the length of the side of square. Now Roy has  N  photos of various dimensions. Dimension of a photo is denoted as  W x H where  W  - width of the photo and  H  - Height of the photo When any photo is uploaded following events may occur: [1] If any of the width or height is less than L, user is prompted to upload another one. Print " UPLOAD ANOTHER " in this case. [2] If width and height, both are large enough and (a) if the photo is already square then it is accepted. Print " ACCEPTED " in this case. (b) else user is prompted to crop it. Print " CROP IT " in this case. (quotes are only for clarification) Given L, N, W and H as input, print appropriate text as output. Input: First line contains  L . Second line contains  N , number of

Salt stack issues

The function “state.apply” is running as PID Restart salt-minion with command:  service salt-minion restart No matching sls found for ‘init’ in env ‘base’ Add top.sls file in the directory where your main sls file is present. Create the file as follows: 1 2 3 base: 'web*' : - apache If the sls is present in a subdirectory elasticsearch/init.sls then write the top.sls as: 1 2 3 base: '*' : - elasticsearch.init How to execute saltstack-formulas create file  /srv/pillar/top.sls  with content: base : ' * ' : - salt create file  /srv/pillar/salt.sls  with content: salt : master : worker_threads : 2 fileserver_backend : - roots - git gitfs_remotes : - git://github.com/saltstack-formulas/epel-formula.git - git://github.com/saltstack-formulas/git-formula.git - git://github.com/saltstack-formulas/nano-formula.git - git://github.com/saltstack-f