Aria Operations, Content, Super Metric, SuperMetric, telegraf, vROps

Multiple Metrics with Aria Operations Telegraf Custom Scripts

When using VMware Aria Operations, integrating telegraf can significantly enhance your monitoring capabilities, provide more extensive and customizable data collection, and help ensure the performance, availability, and efficiency of your infrastructure and applications.

Utilizing the telegraf agent you can run custom scripts in the end point VM and collect custom data which can then be consumed as a metric.

One very important constraint is that the script has to delivery exactly one int64 value.

Problem Statement

If you need to return multiple values or even multiple decimals or floating point values you will need to have multiple scripts for every single value and encode and decode any decimals or floating point metrics.

Even if configuring and running multiple scripts is a doable approach sometimes you have one script providing multiple metrics and breaking down such single script into multiple ones is not an option.

Challenge now is: how to put the multiple metrics into one value and how to revert this one value back into multiple metrics. Basically an encode – decode problem statement.

Solution

Let’s start with some basics in math and recall how the decimal system works. For this I will refresh your memories deconstructing a large number into small pieces – 420230133702600. The following picture shows how this number looks like in the decimal system. I have truncated the sum expression for visibility but you get the point, the number is the sum of it positional values multiplied with the corresponding power of 10.

The idea now is very simple. I will encode two values (in my working use case I use two but it works for any number as long as inn the end it fits into int64) into a larger number using the positions within this single number as displayed in the next picture for four independent values: 7, 62, 230, 4200 which will give es one number – 7622304200.

Figure 02: Encoding four numbers into one number.
So how to do that encoding mathematically?

Depending on the length of the single numbers we need to determine the power of 10 at the position where this single number should start within the final value. 4200 starts at 10^0, 230 at 10^4, 62 at 10^7 and 7 at 10^9. The sum is our single value:

4200 * 10^0 + 
230  * 10^4 +
62   * 10^7 +
7    * 10^9 = 4.200 + 2.300.000 + 620.000.000 + 7.000.000.000 = 7622304200
And now, how to decode that number back into single values?

What we have now is one large number with encoded four values, n1, n2, n3, n4.

Figure 03: Encoding four numbers into one number and the resulting sum.

The math goes backwards this time, and we need two additional mathematical expressions:

  • floor() always rounds down and returns the largest integer less than or equal to a given number
  • the modulo (mod, or %) operation returns the remainder or signed remainder of a division

We start with the most left number and divide it by its starting power of 10 and apply the floor() function to the result of the division. The subsequent numbers further to the right need a slightly different approach:

  • divide the single large number modulo by the power of 10 corresponding to the beginning of the previous number to the left
  • divide the result of the previous step by the power of 10 corresponding to the beginning of the actual number we want to extract
  • apply the floor() function to the result of the previous step
n1 = floor(7622304200 / 10^9) = 7
n2 = floor((7622304200 mod 10^9) / 10^7) = 62
n3 = floor((7622304200 mod 10^7) / 10^4) = 230
n4 = floor((7622304200 mod 10^4) / 10^0) = 4200
How to do all of it in Aria Operations and telegraf?

In my easy to follow example I need to get two metrics from a Virtual Machine using telegraf custom script. For simplicity it is CPU usage in % with values between 0.0 and 100.0 and memory usage in MB ranging theoretically from 0 to 1816 according to the configuration of my VM. I know, we have these metrics in Aria Operations OOTB but this is just an example.

First of all we need to agree on a format to encode both metrics as shown in the next picture. As the CPU usage might become 100.0% and we need to get rid of the decimal value, we need to multiply every CPU usage value by 10, thus we need four positions for this metric.

Figure 04: Encoding two numbers into one number and their positions.

The steps are as follows:

  1. Convert the decimal value into integer. It is one figure precision so simply multiply by 10
  2. Convert both values into one value. Again my assumptions:
    • first number will be 0 >= n1 <= 1000, thus four digits
    • Second number will be (due to my config) also 0 >= n2 <= 1000, thus four digits

This is the shell script to calculate both values and encode them into one int64 number.

#!/bin/bash
# This script returns current CPU and memory usage values

cpuUsage=$(top -bn1 | awk '/Cpu/ { print $2}')
memUsage=$(free -m | awk '/Mem/{print $3}')

# echo "CPU Usage: $cpuUsage%"
# echo "Memory Usage: $memUsage MB"

n1=$cpuUsage
n2=$memUsage

# Calculate the sum using bc
sum=$(echo "($n1*10*10^0)+($n2*10^4)" | bc)

# Print the result
# echo "Sum: $sum"

output=${sum%.*}
echo $output

Now we can configure the script as telegraf custom script as show in the next picture where I run my telegraf on a Linux VM.

Figure 05: Configuration of the telegraf custom script.

After few minute you will see the new metric coming in.

Figure 06: Telegraf custom script and ist new metric – the single large number.

As last task we need to extract or decode the single values for CPU and memory usage from this number. Aria Operations Super Metrics are the best way to do this.

The next two pictures show both super metrics. Important to know is, that this are not so called THIS Super Metrics as the metric provided by the custom script is not added to the VM object itself but to the Custom Script object related to the VM, thus the depth=0 in the Super Metric formula.

Figure 07: Super Metric to decode the first number – memory usage.
Figure 08: Super Metric to decode the second number – CPU usage.

You can find the script and the Super Metrics here: https://github.com/tkopton/aria-operations-content/tree/main/telegraf-script-multimetric

The final result is shown in the next picture.

Figure 09: Both Super Metrics and the single large number returned by the custom script.

Stay safe.

Thomas – https://twitter.com/ThomasKopton

2 Comments

  1. Hi Thomas,

    Nice job! I created 58 PowerShell scripts, but only 10 are running without problems. Most of them show the following message in the console: “Agent executescript activation failed.”

    The script checks the state of a web server. I followed the instructions in this KB: https://docs.vmware.com/en/VMware-Aria-Operations/8.17.1/Configuring-Operations/GUID-1713DEED-7A9A-43C5-944D-9A1DEB8A5859.html.

    Do you have any tips for this situation?

    1. Are the failed scrips always the same ones or is it more like they are randomly failing with this error message?

Leave a Reply

Your email address will not be published. Required fields are marked *