Skip to main content

Graphite and your data

Getting your data into Graphite is very flexible. There are three main methods for sending data to Graphite: Plaintext, Pickle, and AMQP.
It’s worth noting that data sent to Graphite is actually sent to the Carbon and Carbon-Relay, which then manage the data. The Graphite web interface reads this data back out, either from cache or straight off disk.
Choosing the right transfer method for you is dependent on how you want to build your application or script to send data:
  • There are some tools and APIs which can help you get your data into Carbon.
  • For a singular script, or for test data, the plaintext protocol is the most straightforward method.
  • For sending large amounts of data, you’ll want to batch this data up and send it to Carbon’s pickle receiver.
  • Finally, Carbon can listen to a message bus, via AMQP.

The plaintext protocol

The plaintext protocol is the most straightforward protocol supported by Carbon.
The data sent must be in the following format: <metric path> <metric value> <metric timestamp>. Carbon will then help translate this line of text into a metric that the web interface and Whisper understand.
On Unix, the nc program (netcat) can be used to create a socket and send data to Carbon (by default, ‘plaintext’ runs on port 2003):
If you use the OpenBSD implementation of netcat, please follow this example:
PORT=2003
SERVER=graphite.your.org
echo "local.random.diceroll 4 `date +%s`" | nc -q0 ${SERVER} ${PORT}
The -q0 parameter instructs nc to close socket once data is sent. Without this option, some nc versions would keep the connection open.
If you use the GNU implementation of netcat, please follow this example:
PORT=2003
SERVER=graphite.your.org
echo "local.random.diceroll 4 `date +%s`" | nc -c ${SERVER} ${PORT}
The -c parameter instructs nc to close socket once data is sent. Without this option, nc will keep the connection open and won’t end.

The pickle protocol

The pickle protocol is a much more efficient take on the plaintext protocol, and supports sending batches of metrics to Carbon in one go.
The general idea is that the pickled data forms a list of multi-level tuples:
[(path, (timestamp, value)), ...]
Once you’ve formed a list of sufficient size (don’t go too big!), and pickled it (if your client is running a more recent version of python than your server, you may need to specify the protocol) send the data over a socket to Carbon’s pickle receiver (by default, port 2004). You’ll need to pack your pickled data into a packet containing a simple header:
payload = pickle.dumps(listOfMetricTuples, protocol=2)
header = struct.pack("!L", len(payload))
message = header + payload
You would then send the message object through a network socket.

Using AMQP

When AMQP_METRIC_NAME_IN_BODY is set to True in your carbon.conf file, the data should be of the same format as the plaintext protocol, e.g. echo “local.random.diceroll 4 date +%s”. When AMQP_METRIC_NAME_IN_BODY is set to False, you should omit ‘local.random.diceroll’.

Getting Your Data Into Graphite

The Basic Idea

Graphite is useful if you have some numeric values that change over time and you want to graph them. Basically you write a program to collect these numeric values which then sends them to graphite’s backend, Carbon.

Step 1 - Plan a Naming Hierarchy

Everything stored in graphite has a path with components delimited by dots. So for example, website.orbitz.bookings.air or something like that would represent the number of air bookings on orbitz. Before producing your data you need to decide what your naming scheme will be. In a path such as “foo.bar.baz”, each thing surrounded by dots is called a path component. So “foo” is a path component, as well as “bar”, etc.
Each path component should have a clear and well-defined purpose. Volatile path components should be kept as deep into the hierarchy as possible.

Step 2 - Configure your Data Retention

Graphite is built on fixed-size databases (see Whisper.) so we have to configure in advance how much data we intend to store and at what level of precision. For instance you could store your data with 1-minute precision (meaning you will have one data point for each minute) for say 2 hours. Additionally you could store your data with 10-minute precision for 2 weeks, etc. The idea is that the storage cost is determined by the number of data points you want to store, the less fine your precision, the more time you can cover with fewer points. To determine the best retention configuration, you must answer all of the following questions.
  1. How often can you produce your data?
  2. What is the finest precision you will require?
  3. How far back will you need to look at that level of precision?
  4. What is the coarsest precision you can use?
  5. How far back would you ever need to see data? (yes it has to be finite, and determined ahead of time)
Once you have picked your naming scheme and answered all of the retention questions, you need to create a schema by creating/editing the /opt/graphite/conf/storage-schemas.conf file.
The format of the schemas file is easiest to demonstrate with an example. Let’s say we’ve written a script to collect system load data from various servers, the naming scheme will be like so:
servers.HOSTNAME.METRIC
Where HOSTNAME will be the server’s hostname and METRIC will be something like cpu_load, mem_usage, open_files, etc. Also let’s say we want to store this data with minutely precision for 30 days, then at 15 minute precision for 10 years.
For details of implementing your schema, see the Configuring Carbon document.
Basically, when carbon receives a metric, it determines where on the filesystem the whisper data file should be for that metric. If the data file does not exist, carbon knows it has to create it, but since whisper is a fixed size database, some parameters must be determined at the time of file creation (this is the reason we’re making a schema). Carbon looks at the schemas file, and in order of priority (highest to lowest) looks for the first schema whose pattern matches the metric name. If no schema matches the default schema (2 hours of minutely data) is used. Once the appropriate schema is determined, carbon uses the retention configuration for the schema to create the whisper data file appropriately.

Step 3 - Understanding the Graphite Message Format

Graphite understands messages with this format:
metric_path value timestamp\n
metric_path is the metric namespace that you want to populate.
value is the value that you want to assign to the metric at this time.
timestamp is the number of seconds since unix epoch time.
A simple example of doing this from the unix terminal would look like this:
echo "test.bash.stats 42 `date +%s`" | nc graphite.example.com 2003
There are many tools that interact with Graphite. See the Tools page for some choices of tools that may be used to feed Graphite.

Comments

Popular posts from this blog

Terraform

Terraform is a tool for building, changing, and versioning infrastructure safely and efficiently. Terraform can manage existing and popular service providers as well as custom in-house solutions. Configuration files describe to Terraform the components needed to run a single application or your entire datacenter. Terraform generates an execution plan describing what it will do to reach the desired state, and then executes it to build the described infrastructure. As the configuration changes, Terraform is able to determine what changed and create incremental execution plans which can be applied. The infrastructure Terraform can manage includes low-level components such as compute instances, storage, and networking, as well as high-level components such as DNS entries, SaaS features, etc. The key features of Terraform are: Infrastructure as Code : Infrastructure is described using a high-level configuration syntax. This allows a blueprint of your datacenter to be versioned and

Java 8 coding challenge: Roy and Profile Picture

Problem:  Roy wants to change his profile picture on Facebook. Now Facebook has some restriction over the dimension of picture that we can upload. Minimum dimension of the picture can be  L x L , where  L  is the length of the side of square. Now Roy has  N  photos of various dimensions. Dimension of a photo is denoted as  W x H where  W  - width of the photo and  H  - Height of the photo When any photo is uploaded following events may occur: [1] If any of the width or height is less than L, user is prompted to upload another one. Print " UPLOAD ANOTHER " in this case. [2] If width and height, both are large enough and (a) if the photo is already square then it is accepted. Print " ACCEPTED " in this case. (b) else user is prompted to crop it. Print " CROP IT " in this case. (quotes are only for clarification) Given L, N, W and H as input, print appropriate text as output. Input: First line contains  L . Second line contains  N , number of

Salt stack issues

The function “state.apply” is running as PID Restart salt-minion with command:  service salt-minion restart No matching sls found for ‘init’ in env ‘base’ Add top.sls file in the directory where your main sls file is present. Create the file as follows: 1 2 3 base: 'web*' : - apache If the sls is present in a subdirectory elasticsearch/init.sls then write the top.sls as: 1 2 3 base: '*' : - elasticsearch.init How to execute saltstack-formulas create file  /srv/pillar/top.sls  with content: base : ' * ' : - salt create file  /srv/pillar/salt.sls  with content: salt : master : worker_threads : 2 fileserver_backend : - roots - git gitfs_remotes : - git://github.com/saltstack-formulas/epel-formula.git - git://github.com/saltstack-formulas/git-formula.git - git://github.com/saltstack-formulas/nano-formula.git - git://github.com/saltstack-f